{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"../img/ods_stickers.jpg\" />"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 使用 Pandas 进行数据探索"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 介绍"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "本次实验通过分析电信运营商的客户离网率数据集来熟悉 Pandas 数据探索的常用方法，并构建一个预测客户离网率的简单模型。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### 知识点"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 排列\n",
    "- 索引\n",
    "- 交叉表\n",
    "- 透视表\n",
    "- 数据探索"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "---"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<div style=\"background-color: #FFEAC9; margin-bottom: 10px; padding: 2%; border-radius: 6px;\"><p style=\"line-height: 1.5; font-weight: bold;\"> 中文版本说明</p><p style=\"line-height: 1.5;\">本课程中文版本是经由原作者授权 <a href=\"https://www.shiyanlou.com\"><i class=\"fa fa-external-link-square\" aria-hidden=\"true\"> 实验楼</i></a> 编译制作，你可以到 <a href=\"https://www.shiyanlou.com/courses/1283\"><i class=\"fa fa-external-link-square\" aria-hidden=\"true\"> 实验楼课程页面</i></a> 一键启动 Jupyter Notebook 环境学习，若有疑问欢迎到课程页面提出，我们会及时修订。中文版本分发需在遵循原开源协议基础上保留实验楼作为译者署名及链接。</p></div>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Pandas 的主要方法"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pandas 是基于 NumPy 的一种工具，提供了大量数据探索的方法。Pandas 可以使用类似 SQL 的方式对 .csv、.tsv、.xlsx 等格式的数据进行处理分析。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pandas 主要使用的数据结构是 Series 和 DataFrame 类。下面简要介绍下这两类："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- Series 是一种类似于一维数组的对象，它由一组数据（各种 NumPy 数据类型）及一组与之相关的数据标签（即索引）组成。\n",
    "- DataFrame 是一个二维数据结构，即一张表格，其中每列数据的类型相同。你可以把它看成由 Series 实例构成的字典。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下面开始此次实验，我们将通过分析电信运营商的客户离网率数据集来展示 Pandas 的主要方法。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "首先载入必要的库，即 NumPy 和 Pandas。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<i class=\"fa fa-arrow-circle-down\" aria-hidden=\"true\"> 教学代码：</i>"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import warnings\n",
    "\n",
    "import numpy as np\n",
    "import pandas as pd\n",
    "\n",
    "warnings.filterwarnings(\"ignore\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通过 `read_csv()` 方法读取数据，然后使用 `head()` 方法查看前 5 行数据。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>State</th>\n",
       "      <th>Account length</th>\n",
       "      <th>Area code</th>\n",
       "      <th>International plan</th>\n",
       "      <th>Voice mail plan</th>\n",
       "      <th>Number vmail messages</th>\n",
       "      <th>Total day minutes</th>\n",
       "      <th>Total day calls</th>\n",
       "      <th>Total day charge</th>\n",
       "      <th>Total eve minutes</th>\n",
       "      <th>Total eve calls</th>\n",
       "      <th>Total eve charge</th>\n",
       "      <th>Total night minutes</th>\n",
       "      <th>Total night calls</th>\n",
       "      <th>Total night charge</th>\n",
       "      <th>Total intl minutes</th>\n",
       "      <th>Total intl calls</th>\n",
       "      <th>Total intl charge</th>\n",
       "      <th>Customer service calls</th>\n",
       "      <th>Churn</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>KS</td>\n",
       "      <td>128</td>\n",
       "      <td>415</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>25</td>\n",
       "      <td>265.1</td>\n",
       "      <td>110</td>\n",
       "      <td>45.07</td>\n",
       "      <td>197.4</td>\n",
       "      <td>99</td>\n",
       "      <td>16.78</td>\n",
       "      <td>244.7</td>\n",
       "      <td>91</td>\n",
       "      <td>11.01</td>\n",
       "      <td>10.0</td>\n",
       "      <td>3</td>\n",
       "      <td>2.70</td>\n",
       "      <td>1</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>OH</td>\n",
       "      <td>107</td>\n",
       "      <td>415</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>26</td>\n",
       "      <td>161.6</td>\n",
       "      <td>123</td>\n",
       "      <td>27.47</td>\n",
       "      <td>195.5</td>\n",
       "      <td>103</td>\n",
       "      <td>16.62</td>\n",
       "      <td>254.4</td>\n",
       "      <td>103</td>\n",
       "      <td>11.45</td>\n",
       "      <td>13.7</td>\n",
       "      <td>3</td>\n",
       "      <td>3.70</td>\n",
       "      <td>1</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NJ</td>\n",
       "      <td>137</td>\n",
       "      <td>415</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>0</td>\n",
       "      <td>243.4</td>\n",
       "      <td>114</td>\n",
       "      <td>41.38</td>\n",
       "      <td>121.2</td>\n",
       "      <td>110</td>\n",
       "      <td>10.30</td>\n",
       "      <td>162.6</td>\n",
       "      <td>104</td>\n",
       "      <td>7.32</td>\n",
       "      <td>12.2</td>\n",
       "      <td>5</td>\n",
       "      <td>3.29</td>\n",
       "      <td>0</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>OH</td>\n",
       "      <td>84</td>\n",
       "      <td>408</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>0</td>\n",
       "      <td>299.4</td>\n",
       "      <td>71</td>\n",
       "      <td>50.90</td>\n",
       "      <td>61.9</td>\n",
       "      <td>88</td>\n",
       "      <td>5.26</td>\n",
       "      <td>196.9</td>\n",
       "      <td>89</td>\n",
       "      <td>8.86</td>\n",
       "      <td>6.6</td>\n",
       "      <td>7</td>\n",
       "      <td>1.78</td>\n",
       "      <td>2</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>OK</td>\n",
       "      <td>75</td>\n",
       "      <td>415</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>0</td>\n",
       "      <td>166.7</td>\n",
       "      <td>113</td>\n",
       "      <td>28.34</td>\n",
       "      <td>148.3</td>\n",
       "      <td>122</td>\n",
       "      <td>12.61</td>\n",
       "      <td>186.9</td>\n",
       "      <td>121</td>\n",
       "      <td>8.41</td>\n",
       "      <td>10.1</td>\n",
       "      <td>3</td>\n",
       "      <td>2.73</td>\n",
       "      <td>3</td>\n",
       "      <td>False</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  State  Account length  ...    Customer service calls  Churn\n",
       "0    KS             128  ...                         1  False\n",
       "1    OH             107  ...                         1  False\n",
       "2    NJ             137  ...                         0  False\n",
       "3    OH              84  ...                         2  False\n",
       "4    OK              75  ...                         3  False\n",
       "\n",
       "[5 rows x 20 columns]"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv(\"../../data/telecom_churn.csv\")\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上图中的每行对应一位客户，每列对应客户的一个特征。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "让我们查看一下该数据库的维度、特征名称和特征类型。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(3333, 20)"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上述结果表明，我们的列表包含 3333 行和 20 列。下面我们尝试打印列名。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Index(['State', 'Account length', 'Area code', 'International plan',\n",
       "       'Voice mail plan', 'Number vmail messages', 'Total day minutes',\n",
       "       'Total day calls', 'Total day charge', 'Total eve minutes',\n",
       "       'Total eve calls', 'Total eve charge', 'Total night minutes',\n",
       "       'Total night calls', 'Total night charge', 'Total intl minutes',\n",
       "       'Total intl calls', 'Total intl charge', 'Customer service calls',\n",
       "       'Churn'],\n",
       "      dtype='object')"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.columns"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "我们还可以使用 `info()` 方法输出 DataFrame 的一些总体信息。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 3333 entries, 0 to 3332\n",
      "Data columns (total 20 columns):\n",
      "State                     3333 non-null object\n",
      "Account length            3333 non-null int64\n",
      "Area code                 3333 non-null int64\n",
      "International plan        3333 non-null object\n",
      "Voice mail plan           3333 non-null object\n",
      "Number vmail messages     3333 non-null int64\n",
      "Total day minutes         3333 non-null float64\n",
      "Total day calls           3333 non-null int64\n",
      "Total day charge          3333 non-null float64\n",
      "Total eve minutes         3333 non-null float64\n",
      "Total eve calls           3333 non-null int64\n",
      "Total eve charge          3333 non-null float64\n",
      "Total night minutes       3333 non-null float64\n",
      "Total night calls         3333 non-null int64\n",
      "Total night charge        3333 non-null float64\n",
      "Total intl minutes        3333 non-null float64\n",
      "Total intl calls          3333 non-null int64\n",
      "Total intl charge         3333 non-null float64\n",
      "Customer service calls    3333 non-null int64\n",
      "Churn                     3333 non-null bool\n",
      "dtypes: bool(1), float64(8), int64(8), object(3)\n",
      "memory usage: 498.1+ KB\n"
     ]
    }
   ],
   "source": [
    "df.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`bool`、`int64`、`float64` 和 `object` 是该数据库特征的数据类型。这一方法同时也会显示是否有缺失值，上述结果表明在该数据集中不存在缺失值，因为每列都包含 3333 个观测，和我们之前使用 `shape` 方法得到的数字是一致的。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`astype()` 方法可以更改列的类型，下列公式将 Churn 离网率 特征修改为 `int64` 类型。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "df[\"Churn\"] = df[\"Churn\"].astype(\"int64\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`describe()` 方法可以显示数值特征（`int64` 和 `float64`）的基本统计学特性，如未缺失值的数值、均值、标准差、范围、四分位数等。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Account length</th>\n",
       "      <th>Area code</th>\n",
       "      <th>Number vmail messages</th>\n",
       "      <th>Total day minutes</th>\n",
       "      <th>Total day calls</th>\n",
       "      <th>Total day charge</th>\n",
       "      <th>Total eve minutes</th>\n",
       "      <th>Total eve calls</th>\n",
       "      <th>Total eve charge</th>\n",
       "      <th>Total night minutes</th>\n",
       "      <th>Total night calls</th>\n",
       "      <th>Total night charge</th>\n",
       "      <th>Total intl minutes</th>\n",
       "      <th>Total intl calls</th>\n",
       "      <th>Total intl charge</th>\n",
       "      <th>Customer service calls</th>\n",
       "      <th>Churn</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>3333.000000</td>\n",
       "      <td>3333.000000</td>\n",
       "      <td>3333.000000</td>\n",
       "      <td>3333.000000</td>\n",
       "      <td>3333.000000</td>\n",
       "      <td>3333.000000</td>\n",
       "      <td>3333.000000</td>\n",
       "      <td>3333.000000</td>\n",
       "      <td>3333.000000</td>\n",
       "      <td>3333.000000</td>\n",
       "      <td>3333.000000</td>\n",
       "      <td>3333.000000</td>\n",
       "      <td>3333.000000</td>\n",
       "      <td>3333.000000</td>\n",
       "      <td>3333.000000</td>\n",
       "      <td>3333.000000</td>\n",
       "      <td>3333.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>mean</th>\n",
       "      <td>101.064806</td>\n",
       "      <td>437.182418</td>\n",
       "      <td>8.099010</td>\n",
       "      <td>179.775098</td>\n",
       "      <td>100.435644</td>\n",
       "      <td>30.562307</td>\n",
       "      <td>200.980348</td>\n",
       "      <td>100.114311</td>\n",
       "      <td>17.083540</td>\n",
       "      <td>200.872037</td>\n",
       "      <td>100.107711</td>\n",
       "      <td>9.039325</td>\n",
       "      <td>10.237294</td>\n",
       "      <td>4.479448</td>\n",
       "      <td>2.764581</td>\n",
       "      <td>1.562856</td>\n",
       "      <td>0.144914</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>std</th>\n",
       "      <td>39.822106</td>\n",
       "      <td>42.371290</td>\n",
       "      <td>13.688365</td>\n",
       "      <td>54.467389</td>\n",
       "      <td>20.069084</td>\n",
       "      <td>9.259435</td>\n",
       "      <td>50.713844</td>\n",
       "      <td>19.922625</td>\n",
       "      <td>4.310668</td>\n",
       "      <td>50.573847</td>\n",
       "      <td>19.568609</td>\n",
       "      <td>2.275873</td>\n",
       "      <td>2.791840</td>\n",
       "      <td>2.461214</td>\n",
       "      <td>0.753773</td>\n",
       "      <td>1.315491</td>\n",
       "      <td>0.352067</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>min</th>\n",
       "      <td>1.000000</td>\n",
       "      <td>408.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>23.200000</td>\n",
       "      <td>33.000000</td>\n",
       "      <td>1.040000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>25%</th>\n",
       "      <td>74.000000</td>\n",
       "      <td>408.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>143.700000</td>\n",
       "      <td>87.000000</td>\n",
       "      <td>24.430000</td>\n",
       "      <td>166.600000</td>\n",
       "      <td>87.000000</td>\n",
       "      <td>14.160000</td>\n",
       "      <td>167.000000</td>\n",
       "      <td>87.000000</td>\n",
       "      <td>7.520000</td>\n",
       "      <td>8.500000</td>\n",
       "      <td>3.000000</td>\n",
       "      <td>2.300000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>50%</th>\n",
       "      <td>101.000000</td>\n",
       "      <td>415.000000</td>\n",
       "      <td>0.000000</td>\n",
       "      <td>179.400000</td>\n",
       "      <td>101.000000</td>\n",
       "      <td>30.500000</td>\n",
       "      <td>201.400000</td>\n",
       "      <td>100.000000</td>\n",
       "      <td>17.120000</td>\n",
       "      <td>201.200000</td>\n",
       "      <td>100.000000</td>\n",
       "      <td>9.050000</td>\n",
       "      <td>10.300000</td>\n",
       "      <td>4.000000</td>\n",
       "      <td>2.780000</td>\n",
       "      <td>1.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>75%</th>\n",
       "      <td>127.000000</td>\n",
       "      <td>510.000000</td>\n",
       "      <td>20.000000</td>\n",
       "      <td>216.400000</td>\n",
       "      <td>114.000000</td>\n",
       "      <td>36.790000</td>\n",
       "      <td>235.300000</td>\n",
       "      <td>114.000000</td>\n",
       "      <td>20.000000</td>\n",
       "      <td>235.300000</td>\n",
       "      <td>113.000000</td>\n",
       "      <td>10.590000</td>\n",
       "      <td>12.100000</td>\n",
       "      <td>6.000000</td>\n",
       "      <td>3.270000</td>\n",
       "      <td>2.000000</td>\n",
       "      <td>0.000000</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>max</th>\n",
       "      <td>243.000000</td>\n",
       "      <td>510.000000</td>\n",
       "      <td>51.000000</td>\n",
       "      <td>350.800000</td>\n",
       "      <td>165.000000</td>\n",
       "      <td>59.640000</td>\n",
       "      <td>363.700000</td>\n",
       "      <td>170.000000</td>\n",
       "      <td>30.910000</td>\n",
       "      <td>395.000000</td>\n",
       "      <td>175.000000</td>\n",
       "      <td>17.770000</td>\n",
       "      <td>20.000000</td>\n",
       "      <td>20.000000</td>\n",
       "      <td>5.400000</td>\n",
       "      <td>9.000000</td>\n",
       "      <td>1.000000</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       Account length     ...             Churn\n",
       "count     3333.000000     ...       3333.000000\n",
       "mean       101.064806     ...          0.144914\n",
       "std         39.822106     ...          0.352067\n",
       "min          1.000000     ...          0.000000\n",
       "25%         74.000000     ...          0.000000\n",
       "50%        101.000000     ...          0.000000\n",
       "75%        127.000000     ...          0.000000\n",
       "max        243.000000     ...          1.000000\n",
       "\n",
       "[8 rows x 17 columns]"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.describe()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通过 include 参数显式指定包含的数据类型，可以查看非数值特征的统计数据。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>State</th>\n",
       "      <th>International plan</th>\n",
       "      <th>Voice mail plan</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>count</th>\n",
       "      <td>3333</td>\n",
       "      <td>3333</td>\n",
       "      <td>3333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>unique</th>\n",
       "      <td>51</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>top</th>\n",
       "      <td>WV</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>freq</th>\n",
       "      <td>106</td>\n",
       "      <td>3010</td>\n",
       "      <td>2411</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "       State International plan Voice mail plan\n",
       "count   3333               3333            3333\n",
       "unique    51                  2               2\n",
       "top       WV                 No              No\n",
       "freq     106               3010            2411"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.describe(include=[\"object\", \"bool\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`value_counts()` 方法可以查看类别（类型为 object ）和布尔值（类型为 bool ）特征。让我们看下 Churn 离网率 的分布。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    2850\n",
       "1     483\n",
       "Name: Churn, dtype: int64"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"Churn\"].value_counts()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上述结果表明，在 3333 位客户中， 2850 位是忠实客户，他们的 `Churn` 值为 0。调用 `value_counts()` 函数时，加上 `normalize=True` 参数可以显示比例。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0    0.855086\n",
       "1    0.144914\n",
       "Name: Churn, dtype: float64"
      ]
     },
     "execution_count": 10,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"Churn\"].value_counts(normalize=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 排序"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "DataFrame 可以根据某个变量的值（也就是列）排序。比如，根据每日消费额排序（设置 ascending=False 倒序排列）。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>State</th>\n",
       "      <th>Account length</th>\n",
       "      <th>Area code</th>\n",
       "      <th>International plan</th>\n",
       "      <th>Voice mail plan</th>\n",
       "      <th>Number vmail messages</th>\n",
       "      <th>Total day minutes</th>\n",
       "      <th>Total day calls</th>\n",
       "      <th>Total day charge</th>\n",
       "      <th>Total eve minutes</th>\n",
       "      <th>Total eve calls</th>\n",
       "      <th>Total eve charge</th>\n",
       "      <th>Total night minutes</th>\n",
       "      <th>Total night calls</th>\n",
       "      <th>Total night charge</th>\n",
       "      <th>Total intl minutes</th>\n",
       "      <th>Total intl calls</th>\n",
       "      <th>Total intl charge</th>\n",
       "      <th>Customer service calls</th>\n",
       "      <th>Churn</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>365</th>\n",
       "      <td>CO</td>\n",
       "      <td>154</td>\n",
       "      <td>415</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>0</td>\n",
       "      <td>350.8</td>\n",
       "      <td>75</td>\n",
       "      <td>59.64</td>\n",
       "      <td>216.5</td>\n",
       "      <td>94</td>\n",
       "      <td>18.40</td>\n",
       "      <td>253.9</td>\n",
       "      <td>100</td>\n",
       "      <td>11.43</td>\n",
       "      <td>10.1</td>\n",
       "      <td>9</td>\n",
       "      <td>2.73</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>985</th>\n",
       "      <td>NY</td>\n",
       "      <td>64</td>\n",
       "      <td>415</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>0</td>\n",
       "      <td>346.8</td>\n",
       "      <td>55</td>\n",
       "      <td>58.96</td>\n",
       "      <td>249.5</td>\n",
       "      <td>79</td>\n",
       "      <td>21.21</td>\n",
       "      <td>275.4</td>\n",
       "      <td>102</td>\n",
       "      <td>12.39</td>\n",
       "      <td>13.3</td>\n",
       "      <td>9</td>\n",
       "      <td>3.59</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2594</th>\n",
       "      <td>OH</td>\n",
       "      <td>115</td>\n",
       "      <td>510</td>\n",
       "      <td>Yes</td>\n",
       "      <td>No</td>\n",
       "      <td>0</td>\n",
       "      <td>345.3</td>\n",
       "      <td>81</td>\n",
       "      <td>58.70</td>\n",
       "      <td>203.4</td>\n",
       "      <td>106</td>\n",
       "      <td>17.29</td>\n",
       "      <td>217.5</td>\n",
       "      <td>107</td>\n",
       "      <td>9.79</td>\n",
       "      <td>11.8</td>\n",
       "      <td>8</td>\n",
       "      <td>3.19</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>156</th>\n",
       "      <td>OH</td>\n",
       "      <td>83</td>\n",
       "      <td>415</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>0</td>\n",
       "      <td>337.4</td>\n",
       "      <td>120</td>\n",
       "      <td>57.36</td>\n",
       "      <td>227.4</td>\n",
       "      <td>116</td>\n",
       "      <td>19.33</td>\n",
       "      <td>153.9</td>\n",
       "      <td>114</td>\n",
       "      <td>6.93</td>\n",
       "      <td>15.8</td>\n",
       "      <td>7</td>\n",
       "      <td>4.27</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>605</th>\n",
       "      <td>MO</td>\n",
       "      <td>112</td>\n",
       "      <td>415</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>0</td>\n",
       "      <td>335.5</td>\n",
       "      <td>77</td>\n",
       "      <td>57.04</td>\n",
       "      <td>212.5</td>\n",
       "      <td>109</td>\n",
       "      <td>18.06</td>\n",
       "      <td>265.0</td>\n",
       "      <td>132</td>\n",
       "      <td>11.93</td>\n",
       "      <td>12.7</td>\n",
       "      <td>8</td>\n",
       "      <td>3.43</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     State  Account length  ...   Customer service calls Churn\n",
       "365     CO             154  ...                        1     1\n",
       "985     NY              64  ...                        1     1\n",
       "2594    OH             115  ...                        1     1\n",
       "156     OH              83  ...                        0     1\n",
       "605     MO             112  ...                        2     1\n",
       "\n",
       "[5 rows x 20 columns]"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sort_values(by=\"Total day charge\", ascending=False).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "此外，还可以根据多个列的数值排序。下面函数实现的功能为：先按 Churn 离网率 升序排列，再按 Total day charge 每日总话费 降序排列，优先级 Churn > Tatal day charge。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>State</th>\n",
       "      <th>Account length</th>\n",
       "      <th>Area code</th>\n",
       "      <th>International plan</th>\n",
       "      <th>Voice mail plan</th>\n",
       "      <th>Number vmail messages</th>\n",
       "      <th>Total day minutes</th>\n",
       "      <th>Total day calls</th>\n",
       "      <th>Total day charge</th>\n",
       "      <th>Total eve minutes</th>\n",
       "      <th>Total eve calls</th>\n",
       "      <th>Total eve charge</th>\n",
       "      <th>Total night minutes</th>\n",
       "      <th>Total night calls</th>\n",
       "      <th>Total night charge</th>\n",
       "      <th>Total intl minutes</th>\n",
       "      <th>Total intl calls</th>\n",
       "      <th>Total intl charge</th>\n",
       "      <th>Customer service calls</th>\n",
       "      <th>Churn</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>688</th>\n",
       "      <td>MN</td>\n",
       "      <td>13</td>\n",
       "      <td>510</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>21</td>\n",
       "      <td>315.6</td>\n",
       "      <td>105</td>\n",
       "      <td>53.65</td>\n",
       "      <td>208.9</td>\n",
       "      <td>71</td>\n",
       "      <td>17.76</td>\n",
       "      <td>260.1</td>\n",
       "      <td>123</td>\n",
       "      <td>11.70</td>\n",
       "      <td>12.1</td>\n",
       "      <td>3</td>\n",
       "      <td>3.27</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2259</th>\n",
       "      <td>NC</td>\n",
       "      <td>210</td>\n",
       "      <td>415</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>31</td>\n",
       "      <td>313.8</td>\n",
       "      <td>87</td>\n",
       "      <td>53.35</td>\n",
       "      <td>147.7</td>\n",
       "      <td>103</td>\n",
       "      <td>12.55</td>\n",
       "      <td>192.7</td>\n",
       "      <td>97</td>\n",
       "      <td>8.67</td>\n",
       "      <td>10.1</td>\n",
       "      <td>7</td>\n",
       "      <td>2.73</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>534</th>\n",
       "      <td>LA</td>\n",
       "      <td>67</td>\n",
       "      <td>510</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>0</td>\n",
       "      <td>310.4</td>\n",
       "      <td>97</td>\n",
       "      <td>52.77</td>\n",
       "      <td>66.5</td>\n",
       "      <td>123</td>\n",
       "      <td>5.65</td>\n",
       "      <td>246.5</td>\n",
       "      <td>99</td>\n",
       "      <td>11.09</td>\n",
       "      <td>9.2</td>\n",
       "      <td>10</td>\n",
       "      <td>2.48</td>\n",
       "      <td>4</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>575</th>\n",
       "      <td>SD</td>\n",
       "      <td>114</td>\n",
       "      <td>415</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>36</td>\n",
       "      <td>309.9</td>\n",
       "      <td>90</td>\n",
       "      <td>52.68</td>\n",
       "      <td>200.3</td>\n",
       "      <td>89</td>\n",
       "      <td>17.03</td>\n",
       "      <td>183.5</td>\n",
       "      <td>105</td>\n",
       "      <td>8.26</td>\n",
       "      <td>14.2</td>\n",
       "      <td>2</td>\n",
       "      <td>3.83</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2858</th>\n",
       "      <td>AL</td>\n",
       "      <td>141</td>\n",
       "      <td>510</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>28</td>\n",
       "      <td>308.0</td>\n",
       "      <td>123</td>\n",
       "      <td>52.36</td>\n",
       "      <td>247.8</td>\n",
       "      <td>128</td>\n",
       "      <td>21.06</td>\n",
       "      <td>152.9</td>\n",
       "      <td>103</td>\n",
       "      <td>6.88</td>\n",
       "      <td>7.4</td>\n",
       "      <td>3</td>\n",
       "      <td>2.00</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     State  Account length  ...   Customer service calls Churn\n",
       "688     MN              13  ...                        3     0\n",
       "2259    NC             210  ...                        3     0\n",
       "534     LA              67  ...                        4     0\n",
       "575     SD             114  ...                        1     0\n",
       "2858    AL             141  ...                        1     0\n",
       "\n",
       "[5 rows x 20 columns]"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.sort_values(by=[\"Churn\", \"Total day charge\"], ascending=[True, False]).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 索引和获取数据"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "DataFrame 可以以不同的方式进行索引。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用 `DataFrame['Name']` 可以得到一个单独的列。比如，离网率有多高？"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0.14491449144914492"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"Churn\"].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "对一家公司而言，14.5% 的离网率是一个很糟糕的数据，这么高的离网率可能导致公司破产。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "布尔值索引同样很方便，语法是 `df[P(df['Name'])]`，P 是在检查 Name 列每个元素时所使用的逻辑条件。这一索引的输出是 DataFrame 的 Name 列中满足 P 条件的行。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "让我们使用布尔值索引来回答这样以下问题：离网用户的数值变量的均值是多少？"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Account length            102.664596\n",
       "Area code                 437.817805\n",
       "Number vmail messages       5.115942\n",
       "Total day minutes         206.914079\n",
       "Total day calls           101.335404\n",
       "Total day charge           35.175921\n",
       "Total eve minutes         212.410145\n",
       "Total eve calls           100.561077\n",
       "Total eve charge           18.054969\n",
       "Total night minutes       205.231677\n",
       "Total night calls         100.399586\n",
       "Total night charge          9.235528\n",
       "Total intl minutes         10.700000\n",
       "Total intl calls            4.163561\n",
       "Total intl charge           2.889545\n",
       "Customer service calls      2.229814\n",
       "Churn                       1.000000\n",
       "dtype: float64"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df[\"Churn\"] == 1].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "离网用户在白天打电话的总时长的均值是多少？"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "206.91407867494814"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df[\"Churn\"] == 1][\"Total day minutes\"].mean()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "未使用国际套餐（`International plan == NO`）的忠实用户（`Churn == 0`）所打的最长的国际长途是多久？"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "18.9"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[(df[\"Churn\"] == 0) & (df[\"International plan\"] == \"No\")][\"Total intl minutes\"].max()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "DataFrame 可以通过列名、行名、行号进行索引。`loc` 方法为通过名称索引，`iloc` 方法为通过数字索引。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通过 `loc` 方法输出 0 至 5 行、State 州 至 Area code 区号 的数据。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>State</th>\n",
       "      <th>Account length</th>\n",
       "      <th>Area code</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>KS</td>\n",
       "      <td>128</td>\n",
       "      <td>415</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>OH</td>\n",
       "      <td>107</td>\n",
       "      <td>415</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NJ</td>\n",
       "      <td>137</td>\n",
       "      <td>415</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>OH</td>\n",
       "      <td>84</td>\n",
       "      <td>408</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>OK</td>\n",
       "      <td>75</td>\n",
       "      <td>415</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>AL</td>\n",
       "      <td>118</td>\n",
       "      <td>510</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  State  Account length  Area code\n",
       "0    KS             128        415\n",
       "1    OH             107        415\n",
       "2    NJ             137        415\n",
       "3    OH              84        408\n",
       "4    OK              75        415\n",
       "5    AL             118        510"
      ]
     },
     "execution_count": 17,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.loc[0:5, \"State\":\"Area code\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "通过 `ilo` 方法输出前 5 行的前 3 列数据（和典型的 Python 切片一样，不含最大值）。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>State</th>\n",
       "      <th>Account length</th>\n",
       "      <th>Area code</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>KS</td>\n",
       "      <td>128</td>\n",
       "      <td>415</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>OH</td>\n",
       "      <td>107</td>\n",
       "      <td>415</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NJ</td>\n",
       "      <td>137</td>\n",
       "      <td>415</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>OH</td>\n",
       "      <td>84</td>\n",
       "      <td>408</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>OK</td>\n",
       "      <td>75</td>\n",
       "      <td>415</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  State  Account length  Area code\n",
       "0    KS             128        415\n",
       "1    OH             107        415\n",
       "2    NJ             137        415\n",
       "3    OH              84        408\n",
       "4    OK              75        415"
      ]
     },
     "execution_count": 18,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.iloc[0:5, 0:3]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`df[:1]` 和 `df[-1:]` 可以得到 DataFrame 的首行和末行。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>State</th>\n",
       "      <th>Account length</th>\n",
       "      <th>Area code</th>\n",
       "      <th>International plan</th>\n",
       "      <th>Voice mail plan</th>\n",
       "      <th>Number vmail messages</th>\n",
       "      <th>Total day minutes</th>\n",
       "      <th>Total day calls</th>\n",
       "      <th>Total day charge</th>\n",
       "      <th>Total eve minutes</th>\n",
       "      <th>Total eve calls</th>\n",
       "      <th>Total eve charge</th>\n",
       "      <th>Total night minutes</th>\n",
       "      <th>Total night calls</th>\n",
       "      <th>Total night charge</th>\n",
       "      <th>Total intl minutes</th>\n",
       "      <th>Total intl calls</th>\n",
       "      <th>Total intl charge</th>\n",
       "      <th>Customer service calls</th>\n",
       "      <th>Churn</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>3332</th>\n",
       "      <td>TN</td>\n",
       "      <td>74</td>\n",
       "      <td>415</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>25</td>\n",
       "      <td>234.4</td>\n",
       "      <td>113</td>\n",
       "      <td>39.85</td>\n",
       "      <td>265.9</td>\n",
       "      <td>82</td>\n",
       "      <td>22.6</td>\n",
       "      <td>241.4</td>\n",
       "      <td>77</td>\n",
       "      <td>10.86</td>\n",
       "      <td>13.7</td>\n",
       "      <td>4</td>\n",
       "      <td>3.7</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     State  Account length  ...   Customer service calls Churn\n",
       "3332    TN              74  ...                        0     0\n",
       "\n",
       "[1 rows x 20 columns]"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[-1:]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 应用函数到单元格、列、行"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下面通过 `apply()` 方法应用函数 `max` 至每一列，即输出每列的最大值。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "State                        WY\n",
       "Account length              243\n",
       "Area code                   510\n",
       "International plan          Yes\n",
       "Voice mail plan             Yes\n",
       "Number vmail messages        51\n",
       "Total day minutes         350.8\n",
       "Total day calls             165\n",
       "Total day charge          59.64\n",
       "Total eve minutes         363.7\n",
       "Total eve calls             170\n",
       "Total eve charge          30.91\n",
       "Total night minutes         395\n",
       "Total night calls           175\n",
       "Total night charge        17.77\n",
       "Total intl minutes           20\n",
       "Total intl calls             20\n",
       "Total intl charge           5.4\n",
       "Customer service calls        9\n",
       "Churn                         1\n",
       "dtype: object"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.apply(np.max)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`apply()` 方法也可以应用函数至每一行，指定 axis=1 即可。在这种情况下，使用 `lambda` 函数十分方便。比如，下面函数选中了所有以 W 开头的州。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>State</th>\n",
       "      <th>Account length</th>\n",
       "      <th>Area code</th>\n",
       "      <th>International plan</th>\n",
       "      <th>Voice mail plan</th>\n",
       "      <th>Number vmail messages</th>\n",
       "      <th>Total day minutes</th>\n",
       "      <th>Total day calls</th>\n",
       "      <th>Total day charge</th>\n",
       "      <th>Total eve minutes</th>\n",
       "      <th>Total eve calls</th>\n",
       "      <th>Total eve charge</th>\n",
       "      <th>Total night minutes</th>\n",
       "      <th>Total night calls</th>\n",
       "      <th>Total night charge</th>\n",
       "      <th>Total intl minutes</th>\n",
       "      <th>Total intl calls</th>\n",
       "      <th>Total intl charge</th>\n",
       "      <th>Customer service calls</th>\n",
       "      <th>Churn</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>WV</td>\n",
       "      <td>141</td>\n",
       "      <td>415</td>\n",
       "      <td>Yes</td>\n",
       "      <td>Yes</td>\n",
       "      <td>37</td>\n",
       "      <td>258.6</td>\n",
       "      <td>84</td>\n",
       "      <td>43.96</td>\n",
       "      <td>222.0</td>\n",
       "      <td>111</td>\n",
       "      <td>18.87</td>\n",
       "      <td>326.4</td>\n",
       "      <td>97</td>\n",
       "      <td>14.69</td>\n",
       "      <td>11.2</td>\n",
       "      <td>5</td>\n",
       "      <td>3.02</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>26</th>\n",
       "      <td>WY</td>\n",
       "      <td>57</td>\n",
       "      <td>408</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>39</td>\n",
       "      <td>213.0</td>\n",
       "      <td>115</td>\n",
       "      <td>36.21</td>\n",
       "      <td>191.1</td>\n",
       "      <td>112</td>\n",
       "      <td>16.24</td>\n",
       "      <td>182.7</td>\n",
       "      <td>115</td>\n",
       "      <td>8.22</td>\n",
       "      <td>9.5</td>\n",
       "      <td>3</td>\n",
       "      <td>2.57</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>44</th>\n",
       "      <td>WI</td>\n",
       "      <td>64</td>\n",
       "      <td>510</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>0</td>\n",
       "      <td>154.0</td>\n",
       "      <td>67</td>\n",
       "      <td>26.18</td>\n",
       "      <td>225.8</td>\n",
       "      <td>118</td>\n",
       "      <td>19.19</td>\n",
       "      <td>265.3</td>\n",
       "      <td>86</td>\n",
       "      <td>11.94</td>\n",
       "      <td>3.5</td>\n",
       "      <td>3</td>\n",
       "      <td>0.95</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>49</th>\n",
       "      <td>WY</td>\n",
       "      <td>97</td>\n",
       "      <td>415</td>\n",
       "      <td>No</td>\n",
       "      <td>Yes</td>\n",
       "      <td>24</td>\n",
       "      <td>133.2</td>\n",
       "      <td>135</td>\n",
       "      <td>22.64</td>\n",
       "      <td>217.2</td>\n",
       "      <td>58</td>\n",
       "      <td>18.46</td>\n",
       "      <td>70.6</td>\n",
       "      <td>79</td>\n",
       "      <td>3.18</td>\n",
       "      <td>11.0</td>\n",
       "      <td>3</td>\n",
       "      <td>2.97</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>54</th>\n",
       "      <td>WY</td>\n",
       "      <td>87</td>\n",
       "      <td>415</td>\n",
       "      <td>No</td>\n",
       "      <td>No</td>\n",
       "      <td>0</td>\n",
       "      <td>151.0</td>\n",
       "      <td>83</td>\n",
       "      <td>25.67</td>\n",
       "      <td>219.7</td>\n",
       "      <td>116</td>\n",
       "      <td>18.67</td>\n",
       "      <td>203.9</td>\n",
       "      <td>127</td>\n",
       "      <td>9.18</td>\n",
       "      <td>9.7</td>\n",
       "      <td>3</td>\n",
       "      <td>2.62</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   State  Account length  ...   Customer service calls Churn\n",
       "9     WV             141  ...                        0     0\n",
       "26    WY              57  ...                        0     0\n",
       "44    WI              64  ...                        1     0\n",
       "49    WY              97  ...                        1     0\n",
       "54    WY              87  ...                        5     1\n",
       "\n",
       "[5 rows x 20 columns]"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[df[\"State\"].apply(lambda state: state[0] == \"W\")].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`map()` 方法可以通过一个 {old_value:new_value} 形式的字典替换某一列中的值。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>State</th>\n",
       "      <th>Account length</th>\n",
       "      <th>Area code</th>\n",
       "      <th>International plan</th>\n",
       "      <th>Voice mail plan</th>\n",
       "      <th>Number vmail messages</th>\n",
       "      <th>Total day minutes</th>\n",
       "      <th>Total day calls</th>\n",
       "      <th>Total day charge</th>\n",
       "      <th>Total eve minutes</th>\n",
       "      <th>Total eve calls</th>\n",
       "      <th>Total eve charge</th>\n",
       "      <th>Total night minutes</th>\n",
       "      <th>Total night calls</th>\n",
       "      <th>Total night charge</th>\n",
       "      <th>Total intl minutes</th>\n",
       "      <th>Total intl calls</th>\n",
       "      <th>Total intl charge</th>\n",
       "      <th>Customer service calls</th>\n",
       "      <th>Churn</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>KS</td>\n",
       "      <td>128</td>\n",
       "      <td>415</td>\n",
       "      <td>False</td>\n",
       "      <td>Yes</td>\n",
       "      <td>25</td>\n",
       "      <td>265.1</td>\n",
       "      <td>110</td>\n",
       "      <td>45.07</td>\n",
       "      <td>197.4</td>\n",
       "      <td>99</td>\n",
       "      <td>16.78</td>\n",
       "      <td>244.7</td>\n",
       "      <td>91</td>\n",
       "      <td>11.01</td>\n",
       "      <td>10.0</td>\n",
       "      <td>3</td>\n",
       "      <td>2.70</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>OH</td>\n",
       "      <td>107</td>\n",
       "      <td>415</td>\n",
       "      <td>False</td>\n",
       "      <td>Yes</td>\n",
       "      <td>26</td>\n",
       "      <td>161.6</td>\n",
       "      <td>123</td>\n",
       "      <td>27.47</td>\n",
       "      <td>195.5</td>\n",
       "      <td>103</td>\n",
       "      <td>16.62</td>\n",
       "      <td>254.4</td>\n",
       "      <td>103</td>\n",
       "      <td>11.45</td>\n",
       "      <td>13.7</td>\n",
       "      <td>3</td>\n",
       "      <td>3.70</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NJ</td>\n",
       "      <td>137</td>\n",
       "      <td>415</td>\n",
       "      <td>False</td>\n",
       "      <td>No</td>\n",
       "      <td>0</td>\n",
       "      <td>243.4</td>\n",
       "      <td>114</td>\n",
       "      <td>41.38</td>\n",
       "      <td>121.2</td>\n",
       "      <td>110</td>\n",
       "      <td>10.30</td>\n",
       "      <td>162.6</td>\n",
       "      <td>104</td>\n",
       "      <td>7.32</td>\n",
       "      <td>12.2</td>\n",
       "      <td>5</td>\n",
       "      <td>3.29</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>OH</td>\n",
       "      <td>84</td>\n",
       "      <td>408</td>\n",
       "      <td>True</td>\n",
       "      <td>No</td>\n",
       "      <td>0</td>\n",
       "      <td>299.4</td>\n",
       "      <td>71</td>\n",
       "      <td>50.90</td>\n",
       "      <td>61.9</td>\n",
       "      <td>88</td>\n",
       "      <td>5.26</td>\n",
       "      <td>196.9</td>\n",
       "      <td>89</td>\n",
       "      <td>8.86</td>\n",
       "      <td>6.6</td>\n",
       "      <td>7</td>\n",
       "      <td>1.78</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>OK</td>\n",
       "      <td>75</td>\n",
       "      <td>415</td>\n",
       "      <td>True</td>\n",
       "      <td>No</td>\n",
       "      <td>0</td>\n",
       "      <td>166.7</td>\n",
       "      <td>113</td>\n",
       "      <td>28.34</td>\n",
       "      <td>148.3</td>\n",
       "      <td>122</td>\n",
       "      <td>12.61</td>\n",
       "      <td>186.9</td>\n",
       "      <td>121</td>\n",
       "      <td>8.41</td>\n",
       "      <td>10.1</td>\n",
       "      <td>3</td>\n",
       "      <td>2.73</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  State  Account length  ...    Customer service calls  Churn\n",
       "0    KS             128  ...                         1      0\n",
       "1    OH             107  ...                         1      0\n",
       "2    NJ             137  ...                         0      0\n",
       "3    OH              84  ...                         2      0\n",
       "4    OK              75  ...                         3      0\n",
       "\n",
       "[5 rows x 20 columns]"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "d = {\"No\": False, \"Yes\": True}\n",
    "df[\"International plan\"] = df[\"International plan\"].map(d)\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "当然，使用 `repalce()` 方法一样可以达到替换的目的。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>State</th>\n",
       "      <th>Account length</th>\n",
       "      <th>Area code</th>\n",
       "      <th>International plan</th>\n",
       "      <th>Voice mail plan</th>\n",
       "      <th>Number vmail messages</th>\n",
       "      <th>Total day minutes</th>\n",
       "      <th>Total day calls</th>\n",
       "      <th>Total day charge</th>\n",
       "      <th>Total eve minutes</th>\n",
       "      <th>Total eve calls</th>\n",
       "      <th>Total eve charge</th>\n",
       "      <th>Total night minutes</th>\n",
       "      <th>Total night calls</th>\n",
       "      <th>Total night charge</th>\n",
       "      <th>Total intl minutes</th>\n",
       "      <th>Total intl calls</th>\n",
       "      <th>Total intl charge</th>\n",
       "      <th>Customer service calls</th>\n",
       "      <th>Churn</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>KS</td>\n",
       "      <td>128</td>\n",
       "      <td>415</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>25</td>\n",
       "      <td>265.1</td>\n",
       "      <td>110</td>\n",
       "      <td>45.07</td>\n",
       "      <td>197.4</td>\n",
       "      <td>99</td>\n",
       "      <td>16.78</td>\n",
       "      <td>244.7</td>\n",
       "      <td>91</td>\n",
       "      <td>11.01</td>\n",
       "      <td>10.0</td>\n",
       "      <td>3</td>\n",
       "      <td>2.70</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>OH</td>\n",
       "      <td>107</td>\n",
       "      <td>415</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>26</td>\n",
       "      <td>161.6</td>\n",
       "      <td>123</td>\n",
       "      <td>27.47</td>\n",
       "      <td>195.5</td>\n",
       "      <td>103</td>\n",
       "      <td>16.62</td>\n",
       "      <td>254.4</td>\n",
       "      <td>103</td>\n",
       "      <td>11.45</td>\n",
       "      <td>13.7</td>\n",
       "      <td>3</td>\n",
       "      <td>3.70</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NJ</td>\n",
       "      <td>137</td>\n",
       "      <td>415</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>0</td>\n",
       "      <td>243.4</td>\n",
       "      <td>114</td>\n",
       "      <td>41.38</td>\n",
       "      <td>121.2</td>\n",
       "      <td>110</td>\n",
       "      <td>10.30</td>\n",
       "      <td>162.6</td>\n",
       "      <td>104</td>\n",
       "      <td>7.32</td>\n",
       "      <td>12.2</td>\n",
       "      <td>5</td>\n",
       "      <td>3.29</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>OH</td>\n",
       "      <td>84</td>\n",
       "      <td>408</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>0</td>\n",
       "      <td>299.4</td>\n",
       "      <td>71</td>\n",
       "      <td>50.90</td>\n",
       "      <td>61.9</td>\n",
       "      <td>88</td>\n",
       "      <td>5.26</td>\n",
       "      <td>196.9</td>\n",
       "      <td>89</td>\n",
       "      <td>8.86</td>\n",
       "      <td>6.6</td>\n",
       "      <td>7</td>\n",
       "      <td>1.78</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>OK</td>\n",
       "      <td>75</td>\n",
       "      <td>415</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>0</td>\n",
       "      <td>166.7</td>\n",
       "      <td>113</td>\n",
       "      <td>28.34</td>\n",
       "      <td>148.3</td>\n",
       "      <td>122</td>\n",
       "      <td>12.61</td>\n",
       "      <td>186.9</td>\n",
       "      <td>121</td>\n",
       "      <td>8.41</td>\n",
       "      <td>10.1</td>\n",
       "      <td>3</td>\n",
       "      <td>2.73</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  State  Account length  ...    Customer service calls  Churn\n",
       "0    KS             128  ...                         1      0\n",
       "1    OH             107  ...                         1      0\n",
       "2    NJ             137  ...                         0      0\n",
       "3    OH              84  ...                         2      0\n",
       "4    OK              75  ...                         3      0\n",
       "\n",
       "[5 rows x 20 columns]"
      ]
     },
     "execution_count": 23,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = df.replace({\"Voice mail plan\": d})\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 分组（Groupby）"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pandas 下分组数据的一般形式为："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```python\n",
    "df.groupby(by=grouping_columns)[columns_to_show].function()\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对上述函数的解释："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- `groupby()` 方法根据 grouping_columns 的值进行分组。\n",
    "- 接着，选中感兴趣的列（columns_to_show）。若不包括这一项，那么就会选中所有非 groupby 列（即除 grouping_colums 外的所有列）。\n",
    "- 最后，应用一个或多个函数（function）。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在下面的例子中，我们根据 Churn 离网率 变量的值对数据进行分组，显示每组的统计数据。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"6\" halign=\"left\">Total day minutes</th>\n",
       "      <th colspan=\"6\" halign=\"left\">Total eve minutes</th>\n",
       "      <th colspan=\"6\" halign=\"left\">Total night minutes</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>50%</th>\n",
       "      <th>max</th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>50%</th>\n",
       "      <th>max</th>\n",
       "      <th>count</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>min</th>\n",
       "      <th>50%</th>\n",
       "      <th>max</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Churn</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2850.0</td>\n",
       "      <td>175.175754</td>\n",
       "      <td>50.181655</td>\n",
       "      <td>0.0</td>\n",
       "      <td>177.2</td>\n",
       "      <td>315.6</td>\n",
       "      <td>2850.0</td>\n",
       "      <td>199.043298</td>\n",
       "      <td>50.292175</td>\n",
       "      <td>0.0</td>\n",
       "      <td>199.6</td>\n",
       "      <td>361.8</td>\n",
       "      <td>2850.0</td>\n",
       "      <td>200.133193</td>\n",
       "      <td>51.105032</td>\n",
       "      <td>23.2</td>\n",
       "      <td>200.25</td>\n",
       "      <td>395.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>483.0</td>\n",
       "      <td>206.914079</td>\n",
       "      <td>68.997792</td>\n",
       "      <td>0.0</td>\n",
       "      <td>217.6</td>\n",
       "      <td>350.8</td>\n",
       "      <td>483.0</td>\n",
       "      <td>212.410145</td>\n",
       "      <td>51.728910</td>\n",
       "      <td>70.9</td>\n",
       "      <td>211.3</td>\n",
       "      <td>363.7</td>\n",
       "      <td>483.0</td>\n",
       "      <td>205.231677</td>\n",
       "      <td>47.132825</td>\n",
       "      <td>47.4</td>\n",
       "      <td>204.80</td>\n",
       "      <td>354.9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Total day minutes                         ...   Total night minutes               \n",
       "                  count        mean        std  ...                   min     50%    max\n",
       "Churn                                           ...                                     \n",
       "0                2850.0  175.175754  50.181655  ...                  23.2  200.25  395.0\n",
       "1                 483.0  206.914079  68.997792  ...                  47.4  204.80  354.9\n",
       "\n",
       "[2 rows x 18 columns]"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "columns_to_show = [\"Total day minutes\", \"Total eve minutes\", \"Total night minutes\"]\n",
    "\n",
    "df.groupby([\"Churn\"])[columns_to_show].describe(percentiles=[])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "和上面的例子类似，只不过这次将一些函数传给 `agg()`，通过 `agg()` 方法对分组后的数据进行聚合。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr:last-of-type th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th colspan=\"4\" halign=\"left\">Total day minutes</th>\n",
       "      <th colspan=\"4\" halign=\"left\">Total eve minutes</th>\n",
       "      <th colspan=\"4\" halign=\"left\">Total night minutes</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>amin</th>\n",
       "      <th>amax</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>amin</th>\n",
       "      <th>amax</th>\n",
       "      <th>mean</th>\n",
       "      <th>std</th>\n",
       "      <th>amin</th>\n",
       "      <th>amax</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Churn</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>175.175754</td>\n",
       "      <td>50.181655</td>\n",
       "      <td>0.0</td>\n",
       "      <td>315.6</td>\n",
       "      <td>199.043298</td>\n",
       "      <td>50.292175</td>\n",
       "      <td>0.0</td>\n",
       "      <td>361.8</td>\n",
       "      <td>200.133193</td>\n",
       "      <td>51.105032</td>\n",
       "      <td>23.2</td>\n",
       "      <td>395.0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>206.914079</td>\n",
       "      <td>68.997792</td>\n",
       "      <td>0.0</td>\n",
       "      <td>350.8</td>\n",
       "      <td>212.410145</td>\n",
       "      <td>51.728910</td>\n",
       "      <td>70.9</td>\n",
       "      <td>363.7</td>\n",
       "      <td>205.231677</td>\n",
       "      <td>47.132825</td>\n",
       "      <td>47.4</td>\n",
       "      <td>354.9</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "      Total day minutes                  ...   Total night minutes             \n",
       "                   mean        std amin  ...                   std  amin   amax\n",
       "Churn                                    ...                                   \n",
       "0            175.175754  50.181655  0.0  ...             51.105032  23.2  395.0\n",
       "1            206.914079  68.997792  0.0  ...             47.132825  47.4  354.9\n",
       "\n",
       "[2 rows x 12 columns]"
      ]
     },
     "execution_count": 25,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "columns_to_show = [\"Total day minutes\", \"Total eve minutes\", \"Total night minutes\"]\n",
    "\n",
    "df.groupby([\"Churn\"])[columns_to_show].agg([np.mean, np.std, np.min, np.max])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 汇总表"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Pandas 中的透视表定义如下："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 透视表(Pivot Table)是电子表格程序和其他数据探索软件中一种常见的数据汇总工具。它根据一个或多个键对数据进行聚合，并根据行和列上的分组将数据分配到各个矩形区域中。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    " 通过 `pivot_table()` 方法可以建立透视表，其参数如下："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- values 表示需要计算的统计数据的变量列表\n",
    "- index 表示分组数据的变量列表\n",
    "- aggfunc 表示需要计算哪些统计数据，例如，总和、均值、最大值、最小值等。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在，通过 `pivot_table()` 方法查看不同区号下白天、夜晚、深夜的电话量的均值。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>Total day calls</th>\n",
       "      <th>Total eve calls</th>\n",
       "      <th>Total night calls</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Area code</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>408</th>\n",
       "      <td>100.496420</td>\n",
       "      <td>99.788783</td>\n",
       "      <td>99.039379</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>415</th>\n",
       "      <td>100.576435</td>\n",
       "      <td>100.503927</td>\n",
       "      <td>100.398187</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>510</th>\n",
       "      <td>100.097619</td>\n",
       "      <td>99.671429</td>\n",
       "      <td>100.601190</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "           Total day calls        ...          Total night calls\n",
       "Area code                         ...                           \n",
       "408             100.496420        ...                  99.039379\n",
       "415             100.576435        ...                 100.398187\n",
       "510             100.097619        ...                 100.601190\n",
       "\n",
       "[3 rows x 3 columns]"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.pivot_table(\n",
    "    [\"Total day calls\", \"Total eve calls\", \"Total night calls\"],\n",
    "    [\"Area code\"],\n",
    "    aggfunc=\"mean\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`pivot_table()` 其他的使用方法见 [<i class=\"fa fa-external-link-square\" aria-hidden=\"true\"> Pandas 百题大冲关</i>](https://www.shiyanlou.com/courses/1091/labs/6138/document) 的透视表部分。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "交叉表（Cross Tabulation）是一种用于计算分组频率的特殊透视表，在 Pandas 中一般使用 `crosstab()` 方法构建交叉表。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "构建一个交叉表查看样本的 Churn 离网率 和 International plan 国际套餐 的分布情况。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>International plan</th>\n",
       "      <th>False</th>\n",
       "      <th>True</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Churn</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2664</td>\n",
       "      <td>186</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>346</td>\n",
       "      <td>137</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "International plan  False  True \n",
       "Churn                           \n",
       "0                    2664    186\n",
       "1                     346    137"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.crosstab(df[\"Churn\"], df[\"International plan\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "构建一个交叉表查看 Churn 离网率 和 Voice mail plan 语音邮件套餐 的分布情况。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>Voice mail plan</th>\n",
       "      <th>False</th>\n",
       "      <th>True</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Churn</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0.602460</td>\n",
       "      <td>0.252625</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0.120912</td>\n",
       "      <td>0.024002</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Voice mail plan     False     True \n",
       "Churn                              \n",
       "0                0.602460  0.252625\n",
       "1                0.120912  0.024002"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.crosstab(df[\"Churn\"], df[\"Voice mail plan\"], normalize=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上述结果表明，大部分用户是忠实用户，同时他们并不使用额外的服务（国际套餐、语音邮件）。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 增减 DataFrame 的行列"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在 DataFrame 中新增列有很多方法，比如，使用 `insert()`方法添加列，为所有用户计算总的 Total calls 电话量。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>State</th>\n",
       "      <th>Account length</th>\n",
       "      <th>Area code</th>\n",
       "      <th>International plan</th>\n",
       "      <th>Voice mail plan</th>\n",
       "      <th>Number vmail messages</th>\n",
       "      <th>Total day minutes</th>\n",
       "      <th>Total day calls</th>\n",
       "      <th>Total day charge</th>\n",
       "      <th>Total eve minutes</th>\n",
       "      <th>Total eve calls</th>\n",
       "      <th>Total eve charge</th>\n",
       "      <th>Total night minutes</th>\n",
       "      <th>Total night calls</th>\n",
       "      <th>Total night charge</th>\n",
       "      <th>Total intl minutes</th>\n",
       "      <th>Total intl calls</th>\n",
       "      <th>Total intl charge</th>\n",
       "      <th>Customer service calls</th>\n",
       "      <th>Churn</th>\n",
       "      <th>Total calls</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>KS</td>\n",
       "      <td>128</td>\n",
       "      <td>415</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>25</td>\n",
       "      <td>265.1</td>\n",
       "      <td>110</td>\n",
       "      <td>45.07</td>\n",
       "      <td>197.4</td>\n",
       "      <td>99</td>\n",
       "      <td>16.78</td>\n",
       "      <td>244.7</td>\n",
       "      <td>91</td>\n",
       "      <td>11.01</td>\n",
       "      <td>10.0</td>\n",
       "      <td>3</td>\n",
       "      <td>2.70</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>303</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>OH</td>\n",
       "      <td>107</td>\n",
       "      <td>415</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>26</td>\n",
       "      <td>161.6</td>\n",
       "      <td>123</td>\n",
       "      <td>27.47</td>\n",
       "      <td>195.5</td>\n",
       "      <td>103</td>\n",
       "      <td>16.62</td>\n",
       "      <td>254.4</td>\n",
       "      <td>103</td>\n",
       "      <td>11.45</td>\n",
       "      <td>13.7</td>\n",
       "      <td>3</td>\n",
       "      <td>3.70</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>332</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NJ</td>\n",
       "      <td>137</td>\n",
       "      <td>415</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>0</td>\n",
       "      <td>243.4</td>\n",
       "      <td>114</td>\n",
       "      <td>41.38</td>\n",
       "      <td>121.2</td>\n",
       "      <td>110</td>\n",
       "      <td>10.30</td>\n",
       "      <td>162.6</td>\n",
       "      <td>104</td>\n",
       "      <td>7.32</td>\n",
       "      <td>12.2</td>\n",
       "      <td>5</td>\n",
       "      <td>3.29</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>333</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>OH</td>\n",
       "      <td>84</td>\n",
       "      <td>408</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>0</td>\n",
       "      <td>299.4</td>\n",
       "      <td>71</td>\n",
       "      <td>50.90</td>\n",
       "      <td>61.9</td>\n",
       "      <td>88</td>\n",
       "      <td>5.26</td>\n",
       "      <td>196.9</td>\n",
       "      <td>89</td>\n",
       "      <td>8.86</td>\n",
       "      <td>6.6</td>\n",
       "      <td>7</td>\n",
       "      <td>1.78</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>255</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>OK</td>\n",
       "      <td>75</td>\n",
       "      <td>415</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>0</td>\n",
       "      <td>166.7</td>\n",
       "      <td>113</td>\n",
       "      <td>28.34</td>\n",
       "      <td>148.3</td>\n",
       "      <td>122</td>\n",
       "      <td>12.61</td>\n",
       "      <td>186.9</td>\n",
       "      <td>121</td>\n",
       "      <td>8.41</td>\n",
       "      <td>10.1</td>\n",
       "      <td>3</td>\n",
       "      <td>2.73</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>359</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  State  Account length     ...       Churn  Total calls\n",
       "0    KS             128     ...           0          303\n",
       "1    OH             107     ...           0          332\n",
       "2    NJ             137     ...           0          333\n",
       "3    OH              84     ...           0          255\n",
       "4    OK              75     ...           0          359\n",
       "\n",
       "[5 rows x 21 columns]"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "total_calls = (\n",
    "    df[\"Total day calls\"]\n",
    "    + df[\"Total eve calls\"]\n",
    "    + df[\"Total night calls\"]\n",
    "    + df[\"Total intl calls\"]\n",
    ")\n",
    "# loc 参数是插入 Series 对象后选择的列数\n",
    "# 设置为 len(df.columns)以便将计算后的 Total calls 粘贴到最后一列\n",
    "df.insert(loc=len(df.columns), column=\"Total calls\", value=total_calls)\n",
    "\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上面的代码创建了一个中间 Series 实例，即 tatal_calls，其实可以在不创造这个实例的情况下直接添加列。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>State</th>\n",
       "      <th>Account length</th>\n",
       "      <th>Area code</th>\n",
       "      <th>International plan</th>\n",
       "      <th>Voice mail plan</th>\n",
       "      <th>Number vmail messages</th>\n",
       "      <th>Total day minutes</th>\n",
       "      <th>Total day calls</th>\n",
       "      <th>Total day charge</th>\n",
       "      <th>Total eve minutes</th>\n",
       "      <th>Total eve calls</th>\n",
       "      <th>Total eve charge</th>\n",
       "      <th>Total night minutes</th>\n",
       "      <th>Total night calls</th>\n",
       "      <th>Total night charge</th>\n",
       "      <th>Total intl minutes</th>\n",
       "      <th>Total intl calls</th>\n",
       "      <th>Total intl charge</th>\n",
       "      <th>Customer service calls</th>\n",
       "      <th>Churn</th>\n",
       "      <th>Total calls</th>\n",
       "      <th>Total charge</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>KS</td>\n",
       "      <td>128</td>\n",
       "      <td>415</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>25</td>\n",
       "      <td>265.1</td>\n",
       "      <td>110</td>\n",
       "      <td>45.07</td>\n",
       "      <td>197.4</td>\n",
       "      <td>99</td>\n",
       "      <td>16.78</td>\n",
       "      <td>244.7</td>\n",
       "      <td>91</td>\n",
       "      <td>11.01</td>\n",
       "      <td>10.0</td>\n",
       "      <td>3</td>\n",
       "      <td>2.70</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>303</td>\n",
       "      <td>75.56</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>OH</td>\n",
       "      <td>107</td>\n",
       "      <td>415</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>26</td>\n",
       "      <td>161.6</td>\n",
       "      <td>123</td>\n",
       "      <td>27.47</td>\n",
       "      <td>195.5</td>\n",
       "      <td>103</td>\n",
       "      <td>16.62</td>\n",
       "      <td>254.4</td>\n",
       "      <td>103</td>\n",
       "      <td>11.45</td>\n",
       "      <td>13.7</td>\n",
       "      <td>3</td>\n",
       "      <td>3.70</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>332</td>\n",
       "      <td>59.24</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>NJ</td>\n",
       "      <td>137</td>\n",
       "      <td>415</td>\n",
       "      <td>False</td>\n",
       "      <td>False</td>\n",
       "      <td>0</td>\n",
       "      <td>243.4</td>\n",
       "      <td>114</td>\n",
       "      <td>41.38</td>\n",
       "      <td>121.2</td>\n",
       "      <td>110</td>\n",
       "      <td>10.30</td>\n",
       "      <td>162.6</td>\n",
       "      <td>104</td>\n",
       "      <td>7.32</td>\n",
       "      <td>12.2</td>\n",
       "      <td>5</td>\n",
       "      <td>3.29</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>333</td>\n",
       "      <td>62.29</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>OH</td>\n",
       "      <td>84</td>\n",
       "      <td>408</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>0</td>\n",
       "      <td>299.4</td>\n",
       "      <td>71</td>\n",
       "      <td>50.90</td>\n",
       "      <td>61.9</td>\n",
       "      <td>88</td>\n",
       "      <td>5.26</td>\n",
       "      <td>196.9</td>\n",
       "      <td>89</td>\n",
       "      <td>8.86</td>\n",
       "      <td>6.6</td>\n",
       "      <td>7</td>\n",
       "      <td>1.78</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "      <td>255</td>\n",
       "      <td>66.80</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>OK</td>\n",
       "      <td>75</td>\n",
       "      <td>415</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>0</td>\n",
       "      <td>166.7</td>\n",
       "      <td>113</td>\n",
       "      <td>28.34</td>\n",
       "      <td>148.3</td>\n",
       "      <td>122</td>\n",
       "      <td>12.61</td>\n",
       "      <td>186.9</td>\n",
       "      <td>121</td>\n",
       "      <td>8.41</td>\n",
       "      <td>10.1</td>\n",
       "      <td>3</td>\n",
       "      <td>2.73</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "      <td>359</td>\n",
       "      <td>52.09</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  State  Account length      ...       Total calls  Total charge\n",
       "0    KS             128      ...               303         75.56\n",
       "1    OH             107      ...               332         59.24\n",
       "2    NJ             137      ...               333         62.29\n",
       "3    OH              84      ...               255         66.80\n",
       "4    OK              75      ...               359         52.09\n",
       "\n",
       "[5 rows x 22 columns]"
      ]
     },
     "execution_count": 30,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"Total charge\"] = (\n",
    "    df[\"Total day charge\"]\n",
    "    + df[\"Total eve charge\"]\n",
    "    + df[\"Total night charge\"]\n",
    "    + df[\"Total intl charge\"]\n",
    ")\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "使用 `drop()` 方法删除列和行。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>State</th>\n",
       "      <th>Account length</th>\n",
       "      <th>Area code</th>\n",
       "      <th>International plan</th>\n",
       "      <th>Voice mail plan</th>\n",
       "      <th>Number vmail messages</th>\n",
       "      <th>Total day minutes</th>\n",
       "      <th>Total day calls</th>\n",
       "      <th>Total day charge</th>\n",
       "      <th>Total eve minutes</th>\n",
       "      <th>Total eve calls</th>\n",
       "      <th>Total eve charge</th>\n",
       "      <th>Total night minutes</th>\n",
       "      <th>Total night calls</th>\n",
       "      <th>Total night charge</th>\n",
       "      <th>Total intl minutes</th>\n",
       "      <th>Total intl calls</th>\n",
       "      <th>Total intl charge</th>\n",
       "      <th>Customer service calls</th>\n",
       "      <th>Churn</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>KS</td>\n",
       "      <td>128</td>\n",
       "      <td>415</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>25</td>\n",
       "      <td>265.1</td>\n",
       "      <td>110</td>\n",
       "      <td>45.07</td>\n",
       "      <td>197.4</td>\n",
       "      <td>99</td>\n",
       "      <td>16.78</td>\n",
       "      <td>244.7</td>\n",
       "      <td>91</td>\n",
       "      <td>11.01</td>\n",
       "      <td>10.0</td>\n",
       "      <td>3</td>\n",
       "      <td>2.70</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>OH</td>\n",
       "      <td>84</td>\n",
       "      <td>408</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>0</td>\n",
       "      <td>299.4</td>\n",
       "      <td>71</td>\n",
       "      <td>50.90</td>\n",
       "      <td>61.9</td>\n",
       "      <td>88</td>\n",
       "      <td>5.26</td>\n",
       "      <td>196.9</td>\n",
       "      <td>89</td>\n",
       "      <td>8.86</td>\n",
       "      <td>6.6</td>\n",
       "      <td>7</td>\n",
       "      <td>1.78</td>\n",
       "      <td>2</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>OK</td>\n",
       "      <td>75</td>\n",
       "      <td>415</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>0</td>\n",
       "      <td>166.7</td>\n",
       "      <td>113</td>\n",
       "      <td>28.34</td>\n",
       "      <td>148.3</td>\n",
       "      <td>122</td>\n",
       "      <td>12.61</td>\n",
       "      <td>186.9</td>\n",
       "      <td>121</td>\n",
       "      <td>8.41</td>\n",
       "      <td>10.1</td>\n",
       "      <td>3</td>\n",
       "      <td>2.73</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>AL</td>\n",
       "      <td>118</td>\n",
       "      <td>510</td>\n",
       "      <td>True</td>\n",
       "      <td>False</td>\n",
       "      <td>0</td>\n",
       "      <td>223.4</td>\n",
       "      <td>98</td>\n",
       "      <td>37.98</td>\n",
       "      <td>220.6</td>\n",
       "      <td>101</td>\n",
       "      <td>18.75</td>\n",
       "      <td>203.9</td>\n",
       "      <td>118</td>\n",
       "      <td>9.18</td>\n",
       "      <td>6.3</td>\n",
       "      <td>6</td>\n",
       "      <td>1.70</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>MA</td>\n",
       "      <td>121</td>\n",
       "      <td>510</td>\n",
       "      <td>False</td>\n",
       "      <td>True</td>\n",
       "      <td>24</td>\n",
       "      <td>218.2</td>\n",
       "      <td>88</td>\n",
       "      <td>37.09</td>\n",
       "      <td>348.5</td>\n",
       "      <td>108</td>\n",
       "      <td>29.62</td>\n",
       "      <td>212.6</td>\n",
       "      <td>118</td>\n",
       "      <td>9.57</td>\n",
       "      <td>7.5</td>\n",
       "      <td>7</td>\n",
       "      <td>2.03</td>\n",
       "      <td>3</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "  State  Account length  ...    Customer service calls  Churn\n",
       "0    KS             128  ...                         1      0\n",
       "3    OH              84  ...                         2      0\n",
       "4    OK              75  ...                         3      0\n",
       "5    AL             118  ...                         0      0\n",
       "6    MA             121  ...                         3      0\n",
       "\n",
       "[5 rows x 20 columns]"
      ]
     },
     "execution_count": 31,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# 移除先前创捷的列\n",
    "df.drop([\"Total charge\", \"Total calls\"], axis=1, inplace=True)\n",
    "# 删除行\n",
    "df.drop([1, 2]).head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "对上述代码的部分解释："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 将相应的索引 `['Total charge', 'Total calls']` 和 `axis` 参数（1 表示删除列，0 表示删除行，默认值为 0）传给 `drop`。\n",
    "- `inplace` 参数表示是否修改原始 DataFrame （False 表示不修改现有 DataFrame，返回一个新 DataFrame，True 表示修改当前 DataFrame）。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 预测离网率"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "首先，通过上面介绍的 `crosstab()` 方法构建一个交叉表来查看 International plan 国际套餐 变量和 Churn 离网率 的相关性，同时使用 `countplot()` 方法构建计数直方图来可视化结果。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [],
   "source": [
    "# 加载模块，配置绘图\n",
    "import matplotlib.pyplot as plt\n",
    "import seaborn as sns"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.axes._subplots.AxesSubplot at 0x7f673133a8d0>"
      ]
     },
     "execution_count": 33,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAEKCAYAAAAFJbKyAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAFrlJREFUeJzt3X+w3XV95/HnCxIJi1QJRITc0CBGMWkVJUa2YAelAmZdEJeyoQpRcKMzqOg6dcHtCNpxxm61jhTX2VgpsGvJ2kUkZRELLLbgyo9EEJNQlyxCuVl+BquiBU147x/ne/EQ7k3ON7nnnpt7n4+ZM/d7Pt/P93PehzmTF99fn2+qCkmSerXHoAuQJO1eDA5JUisGhySpFYNDktSKwSFJasXgkCS1YnBIkloxOCRJrRgckqRWZgy6gH444IADav78+YMuQ5J2K2vXrn28qubsqN+UDI758+ezZs2aQZchSbuVJA/00s9DVZKkVgwOSVIrBockqZUpeY5DkgblV7/6FcPDwzz11FODLmVMs2bNYmhoiJkzZ+7U9gaHJI2j4eFh9t13X+bPn0+SQZfzPFXF5s2bGR4e5tBDD92pMTxUJUnj6KmnnmL//feflKEBkIT9999/l/aIDA5JGmeTNTRG7Gp9BockqRWDQ5Im0MMPP8yyZcs47LDDOPLII1m6dCkrV67kbW9726BL65knx8dw5B9ePugSJo21f3rmoEuQpoSq4pRTTmH58uWsWrUKgO9///usXr16l8bdsmULM2ZM3D/n7nFI0gS56aabmDlzJu9///ufbXvNa17DG9/4Rp588klOPfVUDj/8cN75zndSVUBnCqXHH38cgDVr1nDssccCcOGFF3LGGWdw9NFHc8YZZ3DppZfyjne8gxNPPJEFCxbwsY99rG/fwz0OSZog69at48gjjxx13Z133sn69es5+OCDOfroo/nOd77DMcccs93xNmzYwC233MLee+/NpZdeyl133cWdd97JXnvtxStf+Uo++MEPMm/evHH/Hu5xSNIksGTJEoaGhthjjz044ogjuP/++3e4zUknncTee+/97PvjjjuOF73oRcyaNYuFCxfywAM9zVnYmsEhSRNk0aJFrF27dtR1e+2117PLe+65J1u2bAFgxowZPPPMMwDPu/din3326WmM8WZwSNIEefOb38zTTz/NypUrn227++67ufnmm8fcZv78+c+GzZVXXtn3GnthcEjSBEnCVVddxQ033MBhhx3GokWLOP/883npS1865jYXXHAB5557LosXL2bPPfecwGrHlpEz91PJ4sWLa1cf5OTluL/m5bhS7+655x5e9apXDbqMHRqtziRrq2rxjrZ1j0OS1ErfgiPJvCQ3JdmQZH2Sc5v2C5NsSnJX81ratc35STYm+WGSE7raT2zaNiY5r181S5J2rJ/3cWwBPlpV30uyL7A2yfXNus9X1We7OydZCCwDFgEHAzckeUWz+ovAW4Bh4I4kq6tqQx9rlySNoW/BUVUPAQ81yz9Lcg8wdzubnAysqqqngR8l2QgsadZtrKr7AJKsavoaHJI0ABNyjiPJfOC1wG1N0weS3J3kkiT7NW1zgQe7Nhtu2sZq3/YzViRZk2TNY489Ns7fQJI0ou/BkeSFwJXAh6vqp8CXgMOAI+jskXxuPD6nqlZW1eKqWjxnzpzxGFKSNIq+zlWVZCad0PhqVX0doKoe6Vr/ZeCa5u0moHtSlaGmje20S9KkNt6X9vd6efx1113Hueeey9atW3nve9/LeeeN33VF/byqKsBXgHuq6s+62g/q6nYKsK5ZXg0sS7JXkkOBBcDtwB3AgiSHJnkBnRPouzYHsSRNYVu3buWcc87hm9/8Jhs2bOCKK65gw4bxOy3czz2Oo4EzgB8kuatp+zhwepIjgALuB94HUFXrk3yNzknvLcA5VbUVIMkHgG8BewKXVNX6PtYtSbu122+/nZe//OW87GUvA2DZsmVcffXVLFy4cFzG7+dVVbcAoz3Y9trtbPNp4NOjtF+7ve0kSb+2adOm50ynPjQ0xG233badLdrxznFJUisGhyRNMXPnzuXBB399F8Pw8DBz527vNrp2DA5JmmJe//rXc++99/KjH/2IX/7yl6xatYqTTjpp3Mb30bGS1EeDmF16xowZXHzxxZxwwgls3bqVs846i0WLFo3f+OM2kiRp0li6dClLly7dcced4KEqSVIrBockqRWDQ5LUisEhSWrF4JAktWJwSJJa8XJcSeqjf/zUb4/reId84gc77HPWWWdxzTXX8JKXvIR169btsH9b7nFI0hTz7ne/m+uuu65v4xsckjTF/O7v/i6zZ8/u2/gGhySpFYNDktSKwSFJasXgkCS14uW4ktRHvVw+O95OP/10vv3tb/P4448zNDTEJz/5Sc4+++xxG9/gkKQp5oorrujr+B6qkiS1YnBIkloxOCRpnFXVoEvYrl2tz+CQpHE0a9YsNm/ePGnDo6rYvHkzs2bN2ukxPDkuSeNoaGiI4eFhHnvssUGXMqZZs2YxNDS009sbHJI0jmbOnMmhhx466DL6ykNVkqRWDA5JUisGhySpFYNDktRK34IjybwkNyXZkGR9knOb9tlJrk9yb/N3v6Y9SS5KsjHJ3Ule1zXW8qb/vUmW96tmSdKO9XOPYwvw0apaCBwFnJNkIXAecGNVLQBubN4DvBVY0LxWAF+CTtAAFwBvAJYAF4yEjSRp4vUtOKrqoar6XrP8M+AeYC5wMnBZ0+0y4O3N8snA5dVxK/DiJAcBJwDXV9UTVfVj4HrgxH7VLUnavgk5x5FkPvBa4DbgwKp6qFn1MHBgszwXeLBrs+Gmbaz2bT9jRZI1SdZM5htvJGl31/fgSPJC4Ergw1X10+511bknf1zuy6+qlVW1uKoWz5kzZzyGlCSNoq/BkWQmndD4alV9vWl+pDkERfP30aZ9EzCva/Ohpm2sdknSAPTzqqoAXwHuqao/61q1Ghi5Mmo5cHVX+5nN1VVHAT9pDml9Czg+yX7NSfHjmzZJ0gD0c66qo4EzgB8kuatp+zjwGeBrSc4GHgBOa9ZdCywFNgK/AN4DUFVPJPlj4I6m36eq6ok+1i1J2o6+BUdV3QJkjNXHjdK/gHPGGOsS4JLxq06StLO8c1yS1IrBIUlqxeCQJLVicEiSWjE4JEmtGBySpFYMDklSKwaHJKkVg0OS1IrBIUlqxeCQJLVicEiSWjE4JEmtGBySpFYMDklSKwaHJKkVg0OS1IrBIUlqxeCQJLVicEiSWjE4JEmtGBySpFYMDklSKwaHJKkVg0OS1IrBIUlqxeCQJLVicEiSWjE4JEmt9BQcSW7spU2SNPVtNziSzEoyGzggyX5JZjev+cDcHWx7SZJHk6zrarswyaYkdzWvpV3rzk+yMckPk5zQ1X5i07YxyXk7+0UlSeNjxg7Wvw/4MHAwsBZI0/5T4OIdbHtp0+fybdo/X1Wf7W5IshBYBixqPuuGJK9oVn8ReAswDNyRZHVVbdjBZ0uS+mS7wVFVXwC+kOSDVfXnbQauqr9v9kx6cTKwqqqeBn6UZCOwpFm3saruA0iyqulrcEjSgOxojwOAqvrzJL8DzO/epqq23ZvoxQeSnAmsAT5aVT+mc9jr1q4+w/z6UNiD27S/YbRBk6wAVgAccsghO1GWJKkXvZ4c/6/AZ4FjgNc3r8U78XlfAg4DjgAeAj63E2OMqqpWVtXiqlo8Z86c8RpWkrSNnvY46ITEwqqqXfmwqnpkZDnJl4FrmrebgHldXYeaNrbTLkkagF7v41gHvHRXPyzJQV1vT2nGBVgNLEuyV5JDgQXA7cAdwIIkhyZ5AZ0T6Kt3tQ5J0s7rdY/jAGBDktuBp0caq+qksTZIcgVwLJ1LeYeBC4BjkxwBFHA/nau2qKr1Sb5G56T3FuCcqtrajPMB4FvAnsAlVbW+zReUJI2vXoPjwrYDV9XpozR/ZTv9Pw18epT2a4Fr236+JKk/er2q6u/6XYgkaffQU3Ak+Rmdw0sALwBmAj+vqt/oV2GSpMmp1z2OfUeWk4TOTXhH9asoSdLk1Xp23Or4BnDCDjtLkqacXg9VvaPr7R507ut4qi8VSZImtV6vqvrXXctb6FxKe/K4VyNJmvR6Pcfxnn4XIknaPfQ6V9VQkqua52s8muTKJEP9Lk6SNPn0enL8L+lM9XFw8/qbpk2SNM30Ghxzquovq2pL87oUcApaSZqGeg2OzUnelWTP5vUuYHM/C5MkTU69BsdZwGnAw3Seo3Eq8O4+1SRJmsR6vRz3U8Dy5ml9JJlN58FOZ/WrMEnS5NTrHserR0IDoKqeAF7bn5IkSZNZr8GxR5L9Rt40exy97q1IkqaQXv/x/xzw3SR/3bz/fUZ5doYkaerr9c7xy5OsAd7cNL2jqjb0ryxJ0mTV8+GmJigMC0ma5lpPqy5Jmt4MDklSKwaHJKkVg0OS1IrBIUlqxeCQJLVicEiSWjE4JEmtGBySpFYMDklSKwaHJKmVvgVHkkuSPJpkXVfb7CTXJ7m3+btf054kFyXZmOTuJK/r2mZ50//eJMv7Va8kqTf93OO4FDhxm7bzgBuragFwY/Me4K3Agua1AvgSPPvcjwuANwBLgAu6nwsiSZp4fQuOqvp74Iltmk8GLmuWLwPe3tV+eXXcCrw4yUHACcD1VfVE8wTC63l+GEmSJtBEn+M4sKoeapYfBg5slucCD3b1G27axmqXJA3IwE6OV1UBNV7jJVmRZE2SNY899th4DStJ2sZEB8cjzSEomr+PNu2bgHld/YaatrHan6eqVlbV4qpaPGfOnHEvXJLUMdHBsRoYuTJqOXB1V/uZzdVVRwE/aQ5pfQs4Psl+zUnx45s2SdKA9Pzo2LaSXAEcCxyQZJjO1VGfAb6W5GzgAeC0pvu1wFJgI/AL4D0AVfVEkj8G7mj6faqqtj3hLkmaQH0Ljqo6fYxVx43St4BzxhjnEuCScSxNkrQLvHNcktSKwSFJasXgkCS1YnBIkloxOCRJrRgckqRWDA5JUisGhySpFYNDktSKwSFJasXgkCS1YnBIkloxOCRJrRgckqRWDA5JUisGhySpFYNDktSKwSFJasXgkCS1YnBIkloxOCRJrRgckqRWDA5JUisGhySpFYNDktSKwSFJasXgkCS1YnBIkloxOCRJrRgckqRWBhIcSe5P8oMkdyVZ07TNTnJ9knubv/s17UlyUZKNSe5O8rpB1CxJ6hjkHsebquqIqlrcvD8PuLGqFgA3Nu8B3gosaF4rgC9NeKWSpGdNpkNVJwOXNcuXAW/var+8Om4FXpzkoEEUKEkaXHAU8LdJ1iZZ0bQdWFUPNcsPAwc2y3OBB7u2HW7aJEkDMGNAn3tMVW1K8hLg+iT/0L2yqipJtRmwCaAVAIcccsj4VSpJeo6B7HFU1abm76PAVcAS4JGRQ1DN30eb7puAeV2bDzVt2465sqoWV9XiOXPm9LN8SZrWJjw4kuyTZN+RZeB4YB2wGljedFsOXN0srwbObK6uOgr4SdchLUnSBBvEoaoDgauSjHz+X1XVdUnuAL6W5GzgAeC0pv+1wFJgI/AL4D0TX7IkacSEB0dV3Qe8ZpT2zcBxo7QXcM4ElCZJ6sFkuhxXkrQbMDgkSa0YHJKkVgwOSVIrBockqRWDQ5LUisEhSWplUHNVaTfyj5/67UGXMGkc8okfDLoEaeDc45AktWJwSJJaMTgkSa0YHJKkVgwOSVIrBockqRWDQ5LUisEhSWrF4JAktWJwSJJaMTgkSa04V5W0GzryDy8fdAmTxto/PXPQJUw77nFIkloxOCRJrRgckqRWPMchabfm82J+baKeF+MehySpFYNDktSKwSFJasXgkCS1YnBIkloxOCRJrew2wZHkxCQ/TLIxyXmDrkeSpqvdIjiS7Al8EXgrsBA4PcnCwVYlSdPTbhEcwBJgY1XdV1W/BFYBJw+4JkmalnaX4JgLPNj1frhpkyRNsCkz5UiSFcCK5u2TSX44yHqmkt+EA4DHB13HpHBBBl2BtuHvs8uu/z5/s5dOu0twbALmdb0fatqeVVUrgZUTWdR0kWRNVS0edB3SaPx9Trzd5VDVHcCCJIcmeQGwDFg94JokaVraLfY4qmpLkg8A3wL2BC6pqvUDLkuSpqXdIjgAqupa4NpB1zFNeQhQk5m/zwmWqhp0DZKk3cjuco5DkjRJ7DaHqjR+kmwFuh8V9vaqun+MvvOBa6rqt/pfmQRJ9gdubN6+FNgKPNa8X9LcBKwBMjimp3+uqiMGXYQ0mqraDBwBkORC4Mmq+mx3nyShc6j9mYmvUB6qEtDZs0hyc5LvNa/fGaXPoiS3J7kryd1JFjTt7+pq/y/N3GLSuEry8iQbknwVWA/MS/JPXeuXJfmLZvnAJF9Psqb5bR41qLqnIoNjetq7+Uf+riRXNW2PAm+pqtcB/xa4aJTt3g98odlbWQwMJ3lV0//opn0r8M7+fwVNU4cDn6+qhWxzE/A2LgL+U3Nj4GnAX0xEcdOFh6qmp9EOVc0ELk4y8o//K0bZ7rvAf0wyBHy9qu5NchxwJHBH5+gBe9MJIakf/m9Vremh3+8Br2x+kwD7Jdm7qv65f6VNHwaHRnwEeAR4DZ090ae27VBVf5XkNuBfAdcmeR8Q4LKqOn8ii9W09fOu5Wfo/P5GzOpaDp5I7xsPVWnEi4CHmpONZ9C5Q/85krwMuK+qLgKuBl5N5+qXU5O8pOkzO0lPE6VJu6L5rf44yYIkewCndK2+AThn5E2zJ61xYnBoxH8Glif5Pp3jyD8fpc9pwLokdwG/BVxeVRuAPwL+NsndwPXAQRNUs/Qf6ExF9L/pPG5hxDnA0c1FHBuAfzeI4qYq7xyXJLXiHockqRWDQ5LUisEhSWrF4JAktWJwSJJaMTi020vyZA99PpzkX0xALfOT/EHX+8VJRpu+ZVc/5/4kB7Tof2mSU8e7Dk1PBoemiw8DrYJjJydrnA88GxxVtaaqPrQT40iTlsGhKSPJsUm+neR/JPmHJF9Nx4eAg4GbktzU9D0+yXebmYD/OskLm/b7k/xJku8Bv9+M9yfNDKv/J8kbm35jzSb8GeCNzQSSH2lquqbZZnaSbzQ3pd2a5NVN+4VJLmk+676m3pHv9I0ka5OsT7Kih/8GTyb5fNP/xiRzRunziSR3JFmXZGUzRTljfVdpWwaHpprX0tm7WAi8jM6svRcB/w94U1W9qTnE80fA7zWzAa8B/n3XGJur6nVVtap5P6OqljTjXtC0jTWb8HnAzVV1RFV9fpvaPgncWVWvBj4OXN617nDgBGAJcEGSmU37WVV1JJ3ZiD+UzkOOtmcfYE1VLQL+rqvebhdX1eubh3PtDbyta91o31V6Dic51FRze1UNAzRTo8wHbtmmz1F0guU7zf9sv4DOzL8j/vs2/b/e/F3bjAe9zSa8rWOAfwNQVf8ryf5JfqNZ9z+r6mng6SSPAgfSmULjQ0lG5mCaBywANm/nM57pqv+/ddXe7U1JPkbn0N1sOs+2+JvtfFfpOQwOTTVPdy1vZfTfeIDrq+r0McbYdp6ukTG7x9vhbMItPa/uJMfSmR78X1bVL5J8m+fOANuL58wplGQWnXnJFlfVg+k8Ya97zNG+q/QcHqrSdPEzYN9m+VY6E+C9HCDJPkl62WPoNtZswt2fs62baR5y1YTC41X10x18xo+b0Diczp7SjuwBjFw99Qc8f29rJCQeb87reKWVWjM4NF2sBK5LclNVPQa8G7iimdH3u3TOMbQx1mzCdwNbk3w/yUe22eZC4MjmMz8DLN/BZ1xHZ8/jnqb/rT3U9XNgSZJ1wJuBT3WvrKp/Ar4MrKMzq+wdPYwpPYez40pTSJInq+qFg65DU5t7HJKkVtzjkCS14h6HJKkVg0OS1IrBIUlqxeCQJLVicEiSWjE4JEmt/H+Y7QHeYduEvgAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "sns.countplot(x=\"International plan\", hue=\"Churn\", data=df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上图表明，开通了国际套餐的用户的离网率要高很多，这是一个很有趣的观测结果。也许，国际电话高昂的话费让客户很不满意。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "同理，查看 Customer service calls 客服呼叫 变量与 Chunrn 离网率 的相关性，并可视化结果。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 34,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>Customer service calls</th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>2</th>\n",
       "      <th>3</th>\n",
       "      <th>4</th>\n",
       "      <th>5</th>\n",
       "      <th>6</th>\n",
       "      <th>7</th>\n",
       "      <th>8</th>\n",
       "      <th>9</th>\n",
       "      <th>All</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Churn</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>605</td>\n",
       "      <td>1059</td>\n",
       "      <td>672</td>\n",
       "      <td>385</td>\n",
       "      <td>90</td>\n",
       "      <td>26</td>\n",
       "      <td>8</td>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>2850</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>92</td>\n",
       "      <td>122</td>\n",
       "      <td>87</td>\n",
       "      <td>44</td>\n",
       "      <td>76</td>\n",
       "      <td>40</td>\n",
       "      <td>14</td>\n",
       "      <td>5</td>\n",
       "      <td>1</td>\n",
       "      <td>2</td>\n",
       "      <td>483</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>All</th>\n",
       "      <td>697</td>\n",
       "      <td>1181</td>\n",
       "      <td>759</td>\n",
       "      <td>429</td>\n",
       "      <td>166</td>\n",
       "      <td>66</td>\n",
       "      <td>22</td>\n",
       "      <td>9</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>3333</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Customer service calls    0     1    2    3    4   5   6  7  8  9   All\n",
       "Churn                                                                  \n",
       "0                       605  1059  672  385   90  26   8  4  1  0  2850\n",
       "1                        92   122   87   44   76  40  14  5  1  2   483\n",
       "All                     697  1181  759  429  166  66  22  9  2  2  3333"
      ]
     },
     "execution_count": 34,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.crosstab(df[\"Churn\"], df[\"Customer service calls\"], margins=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.axes._subplots.AxesSubplot at 0x7f672e7bfda0>"
      ]
     },
     "execution_count": 35,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAEKCAYAAAAFJbKyAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAGjdJREFUeJzt3X2UVfV97/H3R0AG0SIg8YEhHUSuBpL4wPjQoNZIGpVYUEu8uhQxmkWyFipJWo323hUTk5trlmmN1jYtSxRojcSoROry2iBKormKGQSRh1ipTwyVMKIxsbmo4Pf+sX+DR2CYs4c5e59hPq+1zjp7//Zv7/09I85n9sP5bUUEZmZm1dqn7ALMzKxncXCYmVkuDg4zM8vFwWFmZrk4OMzMLBcHh5mZ5eLgMDOzXBwcZmaWi4PDzMxy6Vt2AbVw0EEHRVNTU9llmJn1KMuWLXs9IoZ11m+vDI6mpiZaWlrKLsPMrEeR9Eo1/XyqyszMcnFwmJlZLg4OMzPLZa+8xmHWXd577z1aW1vZsmVL2aV0qKGhgcbGRvr161d2KdZLODjMdqO1tZUDDjiApqYmJJVdzk4igs2bN9Pa2srIkSPLLsd6CZ+qMtuNLVu2MHTo0LoMDQBJDB06tK6PiGzv4+Aw60S9hka7eq/P9j4ODjMzy8XBYdZFGzdu5IILLmDUqFGMGzeOiRMnMmvWLM4+++yySzOrKV8cL8i4q+d1ab1lN13SzZVYd4gIzj33XKZNm8b8+fMBePbZZ1m4cOEebXfr1q307ev/La2++YjDrAsee+wx+vXrx5e//OXtbUcffTSnnHIKb7/9NlOmTOGoo47ioosuIiKAbCic119/HYCWlhZOO+00AL75zW8ydepUxo8fz9SpU5kzZw7nnXceZ555JqNHj+aaa64p/POZ7Y7/tDHrglWrVjFu3LhdLlu+fDmrV6/msMMOY/z48fzyl7/k5JNP3u321qxZwxNPPMGAAQOYM2cOK1asYPny5fTv358jjzySK6+8khEjRtTio5jl5iMOs252wgkn0NjYyD777MMxxxzDyy+/3Ok6kyZNYsCAAdvnJ0yYwKBBg2hoaGDMmDG88kpVY8+ZFcLBYdYFY8eOZdmyZbtc1r9//+3Tffr0YevWrQD07duX999/H2Cn710MHDiwqm2Y1QMHh1kXnH766bzzzjvMmjVre9vKlSt5/PHHO1ynqalpe9jcd999Na/RrFYcHGZdIIkFCxbwyCOPMGrUKMaOHct1113HIYcc0uE6119/PTNnzqS5uZk+ffoUWK1Z91L7HR97k+bm5qi3Bzn5dtyeae3atXzsYx8ru4xO9ZQ6rb5JWhYRzZ31q9kRh6Q7JG2StKqibYikRZJeSO+DU7sk3SppnaSVko6rWGda6v+CpGm1qtfMzKpTy1NVc4Azd2i7FlgcEaOBxWke4CxgdHpNB34IWdAA1wMnAicA17eHjZmZlaNmwRERvwDe2KF5MjA3Tc8FzqlonxeZp4ADJR0KnAEsiog3IuJNYBE7h5GZmRWo6IvjB0fEa2l6I3Bwmh4OrK/o15raOmrfiaTpkloktbS1tXVv1WZmtl1pd1VFdlW+267MR8SsiGiOiOZhw4Z112bNzGwHRQfHb9IpKNL7ptS+AagcT6ExtXXUbmZmJSl6rKqFwDTgxvT+QEX7FZLmk10IfysiXpP0b8B3Ky6Ifxa4ruCazbbr6m3VHanmduuHH36YmTNnsm3bNr74xS9y7bXXdrqOWS3VLDgk3Q2cBhwkqZXs7qgbgXskXQ68Apyfuj8ETATWAX8AvgAQEW9I+jbwq9TvhojY8YK72V5r27ZtzJgxg0WLFtHY2Mjxxx/PpEmTGDNmTNmlWS9Ws+CIiAs7WDRhF30DmNHBdu4A7ujG0sx6jKeffpojjjiCww8/HIALLriABx54wMFhpfKQI2Z1bMOGDR8aTr2xsZENG3yZz8rl4DAzs1wcHGZ1bPjw4axf/8FXmVpbWxk+fJdfZTIrjIPDrI4df/zxvPDCC7z00ku8++67zJ8/n0mTJpVdlvVyfnSsWQ5Fj1bct29fbrvtNs444wy2bdvGZZddxtixYwutwWxHDg6zOjdx4kQmTpxYdhlm2/lUlZmZ5eLgMDOzXBwcZmaWi4PDzMxycXCYmVkuDg4zM8vFt+Oa5fDqDZ/o1u199BvPddrnsssu48EHH+QjH/kIq1at6tb9m3WFjzjM6tyll17Kww8/XHYZZts5OMzq3KmnnsqQIUPKLsNsOweHmZnl4uAwM7NcHBxmZpaLg8PMzHLx7bhmOVRz+2x3u/DCC1myZAmvv/46jY2NfOtb3+Lyyy8vvA6zdg4Oszp39913l12C2Yf4VJWZmeXi4DAzs1wcHGadiIiyS9iteq/P9j4ODrPdaGhoYPPmzXX7yzki2Lx5Mw0NDWWXYr2IL46b7UZjYyOtra20tbWVXUqHGhoaaGxsLLsM60UcHGa70a9fP0aOHFl2GWZ1xaeqzMwsFweHmZnl4uAwM7NcSgkOSV+VtFrSKkl3S2qQNFLSUknrJP1Y0r6pb/80vy4tbyqjZjMzyxQeHJKGA1cBzRHxcaAPcAHwPeDmiDgCeBNoH4zncuDN1H5z6mdmZiUp61RVX2CApL7AfsBrwOnAvWn5XOCcND05zZOWT5CkAms1M7MKhQdHRGwAvg+8ShYYbwHLgN9GxNbUrRUYnqaHA+vTultT/6E7blfSdEktklrq+Z57M7OeroxTVYPJjiJGAocBA4Ez93S7ETErIpojonnYsGF7ujkzM+tAGaeqPgO8FBFtEfEecD8wHjgwnboCaAQ2pOkNwAiAtHwQsLnYks3MrF0ZwfEqcJKk/dK1ignAGuAxYErqMw14IE0vTPOk5Y9GvQ4cZGbWC5RxjWMp2UXuZ4DnUg2zgK8DX5O0juwaxuy0ymxgaGr/GnBt0TWbmdkHShmrKiKuB67foflF4IRd9N0CfL6IuszMrHP+5riZmeXi4DAzs1wcHGZmlouDw8zMcnFwmJlZLg4OMzPLxcFhZma5ODjMzCwXB4eZmeXi4DAzs1wcHGZmlouDw8zMcnFwmJlZLg4OMzPLxcFhZma5ODjMzCwXB4eZmeXi4DAzs1wcHGZmlkspzxy38oy7el7udZbddEkNKjGznspHHGZmlouDw8zMcnFwmJlZLg4OMzPLxcFhZma5ODjMzCwXB4eZmeXi4DAzs1wcHGZmlouDw8zMcqkqOCQtrqatWpIOlHSvpF9LWivpTyQNkbRI0gvpfXDqK0m3SlonaaWk47q6XzMz23O7DQ5JDZKGAAdJGpx+uQ+R1AQM34P93gI8HBFHAUcDa4FrgcURMRpYnOYBzgJGp9d04Id7sF8zM9tDnQ1y+CXgK8BhwDJAqf13wG1d2aGkQcCpwKUAEfEu8K6kycBpqdtcYAnwdWAyMC8iAngqHa0cGhGvVbtPD+xnZtZ9dnvEERG3RMRI4K8i4vCIGJleR0dEl4IDGAm0AXdKWi7pdkkDgYMrwmAjcHCaHg6sr1i/lV0c7UiaLqlFUktbW1sXSzMzs85UNax6RPydpE8BTZXrRET+P+Wz9Y8DroyIpZJu4YPTUu3bDUmRZ6MRMQuYBdDc3JxrXTMzq15VwSHpn4FRwApgW2oOoCvB0Qq0RsTSNH8vWXD8pv0UlKRDgU1p+QZgRMX6janNzMxKUO2DnJqBMek6wx6JiI2S1ks6MiKeByYAa9JrGnBjen8grbIQuELSfOBE4K081zfMzKx7VRscq4BDgO76hX0lcJekfYEXgS+QXW+5R9LlwCvA+anvQ8BEYB3wh9TXzMxKUm1wHASskfQ08E57Y0RM6spOI2IF2VHMjibsom8AM7qyHzMz637VBsc3a1mEmZn1HNXeVfXzWhdiZmY9Q7V3Vf2e7C4qgH2BfsB/RcQf1aowMzOrT9UecRzQPi1JZN/mPqlWRZmZWf3KPTpuZH4KnFGDeszMrM5Ve6rqvIrZfcjuiNpSk4rMzKyuVXtX1Z9XTG8FXiY7XWVmZr1Mtdc4/KU7MzMDqn+QU6OkBZI2pdd9khprXZyZmdWfai+O30k2ZtRh6fWvqc3MzHqZaoNjWETcGRFb02sOMKyGdZmZWZ2qNjg2S7pYUp/0uhjYXMvCzMysPlUbHJeRjVa7kWyE3CmkR7+amVnvUu3tuDcA0yLiTQBJQ4DvkwWKmZn1ItUecXyyPTQAIuIN4NjalGRmZvWs2uDYR9Lg9pl0xFHt0YqZme1Fqv3l/zfAk5J+kuY/D/yv2pRkZmb1rNpvjs+T1AKcnprOi4g1tSvLzMzqVdWnm1JQOCzMzHq53MOqm5lZ7+bgMDOzXBwcZmaWi4PDzMxycXCYmVkuDg4zM8vFwWFmZrk4OMzMLBcHh5mZ5eLgMDOzXBwcZmaWS2nBkR5Bu1zSg2l+pKSlktZJ+rGkfVN7/zS/Li1vKqtmMzMr94hjJrC2Yv57wM0RcQTwJnB5ar8ceDO135z6mZlZSUoJDkmNwOeA29O8yIZsvzd1mQuck6Ynp3nS8gmpv5mZlaCsI44fANcA76f5ocBvI2Jrmm8Fhqfp4cB6gLT8rdTfzMxKUHhwSDob2BQRy7p5u9MltUhqaWtr685Nm5lZhTKOOMYDkyS9DMwnO0V1C3CgpPYHSzUCG9L0BmAEQFo+CNi840YjYlZENEdE87Bhw2r7CczMerHCgyMirouIxohoAi4AHo2Ii4DHgCmp2zTggTS9MM2Tlj8aEVFgyWZmVqHqR8cW4OvAfEnfAZYDs1P7bOCfJa0D3iALG+vBxl09r0vrLbvpkm6uxMy6otTgiIglwJI0/SJwwi76bAE+X2hhZmbWIX9z3MzMcnFwmJlZLg4OMzPLxcFhZma5ODjMzCwXB4eZmeXi4DAzs1wcHGZmlouDw8zMcnFwmJlZLg4OMzPLxcFhZma5ODjMzCwXB4eZmeXi4DAzs1wcHGZmlouDw8zMcnFwmJlZLg4OMzPLxcFhZma5ODjMzCwXB4eZmeXi4DAzs1wcHGZmlouDw8zMcnFwmJlZLg4OMzPLxcFhZma5ODjMzCwXB4eZmeVSeHBIGiHpMUlrJK2WNDO1D5G0SNIL6X1wapekWyWtk7RS0nFF12xmZh8o44hjK/CXETEGOAmYIWkMcC2wOCJGA4vTPMBZwOj0mg78sPiSzcysXeHBERGvRcQzafr3wFpgODAZmJu6zQXOSdOTgXmReQo4UNKhBZdtZmZJqdc4JDUBxwJLgYMj4rW0aCNwcJoeDqyvWK01tZmZWQlKCw5J+wP3AV+JiN9VLouIACLn9qZLapHU0tbW1o2VmplZpVKCQ1I/stC4KyLuT82/aT8Fld43pfYNwIiK1RtT24dExKyIaI6I5mHDhtWueDOzXq6Mu6oEzAbWRsTfVixaCExL09OAByraL0l3V50EvFVxSsvMzArWt4R9jgemAs9JWpHa/hq4EbhH0uXAK8D5adlDwERgHfAH4AvFlmtmZpUKD46IeAJQB4sn7KJ/ADNqWpSZmVXN3xw3M7NcHBxmZpaLg8PMzHJxcJiZWS4ODjMzy8XBYWZmuTg4zMwsFweHmZnl4uAwM7NcHBxmZpZLGWNVWQ6v3vCJ3Ot89BvP1aASM7OMjzjMzCwXB4eZmeXi4DAzs1wcHGZmlouDw8zMcvFdVR3w3Ux7v3FXz8u9zrKbLqlBJWY9i4PDOuUQNbNKPlVlZma5+IjDLAcffZn5iMPMzHJycJiZWS4ODjMzy8XBYWZmuTg4zMwsF99VZT2G72gyqw8ODrMepisBCg5R6z4+VWVmZrk4OMzMLBefqjIrUVcGWlxwQA0KMcuhxxxxSDpT0vOS1km6tux6zMx6qx5xxCGpD/D3wJ8BrcCvJC2MiDXlVmbWe/kut96rRwQHcAKwLiJeBJA0H5gMODjMuoFPmVkePSU4hgPrK+ZbgRNLqsXMaqBr4XVTl/a1Nx75FHmbtiKiSzsrkqQpwJkR8cU0PxU4MSKuqOgzHZieZo8Ent/D3R4EvL6H2+gO9VBHPdQA9VGHa/hAPdRRDzVAfdTRHTX8cUQM66xTTzni2ACMqJhvTG3bRcQsYFZ37VBSS0Q0d9f2enId9VBDvdThGuqrjnqooV7qKLKGnnJX1a+A0ZJGStoXuABYWHJNZma9Uo844oiIrZKuAP4N6APcERGrSy7LzKxX6hHBARARDwEPFbjLbjvttYfqoY56qAHqow7X8IF6qKMeaoD6qKOwGnrExXEzM6sfPeUah5mZ1QkHxy6UPbyJpDskbZK0quh971DHCEmPSVojabWkmSXU0CDpaUnPphq+VXQNFbX0kbRc0oMl1vCypOckrZDUUmIdB0q6V9KvJa2V9CcF7//I9DNof/1O0leKrCHV8dX073KVpLslNRRdQ6pjZqphdRE/B5+q2kEa3uTfqRjeBLiwyOFNJJ0KvA3Mi4iPF7XfXdRxKHBoRDwj6QBgGXBOwT8LAQMj4m1J/YAngJkR8VRRNVTU8jWgGfijiDi76P2nGl4GmiOi1O8MSJoLPB4Rt6c7HfeLiN+WVEsfstvzT4yIVwrc73Cyf49jIuL/SboHeCgi5hRVQ6rj48B8shE23gUeBr4cEetqtU8fcexs+/AmEfEu2X+QyUUWEBG/AN4ocp8d1PFaRDyTpn8PrCX7Fn+RNUREvJ1m+6VX4X/tSGoEPgfcXvS+642kQcCpwGyAiHi3rNBIJgD/UWRoVOgLDJDUF9gP+M8SavgYsDQi/hARW4GfA+fVcocOjp3taniTQn9Z1iNJTcCxwNIS9t1H0gpgE7AoIgqvAfgBcA3wfgn7rhTAzyQtS6MllGEk0AbcmU7d3S5pYEm1QPa9rruL3mlEbAC+D7wKvAa8FRE/K7oOYBVwiqShkvYDJvLhL0x3OweHdUrS/sB9wFci4ndF7z8itkXEMWQjBpyQDs0LI+lsYFNELCtyvx04OSKOA84CZqTTmkXrCxwH/DAijgX+CyjlUQfpNNkk4Ccl7Hsw2dmIkcBhwEBJFxddR0SsBb4H/IzsNNUKYFst9+ng2Fmnw5v0Jum6wn3AXRFxf5m1pNMhjwFnFrzr8cCkdH1hPnC6pH8puAZg+1+5RMQmYAHZqdWitQKtFUd+95IFSRnOAp6JiN+UsO/PAC9FRFtEvAfcD3yqhDqIiNkRMS4iTgXeJLtOWzMOjp15eJMkXZieDayNiL8tqYZhkg5M0wPIblr4dZE1RMR1EdEYEU1k/x4ejYjC/7KUNDDdpEA6NfRZstMUhYqIjcB6SUempgmU94iDCynhNFXyKnCSpP3S/ysTyK4DFk7SR9L7R8mub/yolvvrMd8cL0o9DG8i6W7gNOAgSa3A9RExu8gakvHAVOC5dI0B4K/Tt/iLcigwN905sw9wT0SUdjtsyQ4GFmS/o+gL/CgiHi6pliuBu9IfVy8CXyi6gBSefwZ8qeh9A0TEUkn3As8AW4HllPcN8vskDQXeA2bU+mYF345rZma5+FSVmZnl4uAwM7NcHBxmZpaLg8PMzHJxcJiZWS4ODqs7kg6RNF/Sf6ShNR6S9N+6sJ1zJI2pRY1lktQs6dYS9vt2em8qe+RmK5eDw+pK+iLVAmBJRIyKiHHAdWTfYcjrHKDQ4EjfN+mO7XT4HauIaImIq7pjP2Zd4eCwevNp4L2I+Mf2hoh4NiIel3Ra5bMwJN0m6dI0fWN6bshKSd+X9CmyMYxuSs9rGCXpGElPpT4L0lhDSFoi6WZJLenZEsdLul/SC5K+U7G/i5U9G2SFpH9qDwlJb0v6G0nPAh96LoWkqyrqmp/aBip75srTaZDAyan9UkkLJT0KLE5HXZ+r2NYcSVMqfw6S9pd0p7JndKyU9Bep/bOSnpT0jKSfpPHG2KG2IyQ9ouxZJ8+kn9H+khan+efaa+uIpLEVP5OVkkZX8x/ZeriI8MuvunkBVwE3d7DsNODBivnbgEuBocDzfPCF1gPT+xxgSkX/lcCfpukbgB+k6SXA99L0TLKhsQ8F+pONyzSUbOjqfwX6pX7/AFySpgM4v4Oa/xPov0Nd3wUubm8jG1doYPosrcCQtOxcYG6a3pds1OYBlT8HssHtflCxv8HAQcAvyJ5jAvB14Bu7qG0pcG6abiAbFrwv2fNGSNtZV/FzfTu9NwGr0vTfARdV1Dig7H9DftX+5SFHbG/wFrAFmJ3+Et9pSBJlz5A4MCJ+nprm8uERVdvHI3sOWB0Rr6X1XiQb9PJkYBzwqzTkxwCyYd4hG4n0vg5qW0k2NMdPgZ+mts+SDZr4V2m+Afhoml4UEe3PYvk/wC2S+pMN7PiLyB4YVLn9z5CNnwVARLypbDTfMcAvU999gSd3+HkcAAyPiAVpvS2pvR/wXWWj7r5P9kiBg4GNHXy+J4H/oex5JfdHxAsd9LO9iIPD6s1qYEoHy7by4dOrDbB9fLETyAaZmwJcAZyec7/vpPf3K6bb5/sCIvvr/7pdrLslIjoaxvpzZA89+nOyX7CfSNv6i4h4vrKjpBPJhignfa4tkpYAZwD/nWxk3mqILIAurLJ/pYuAYcC4iHhP2YjAHT4ONSJ+JGkp2ed8SNKXIuLRLuzXehBf47B68yjQXxUPKZL0SUmnAK8AYyT1VzZi7oS0fH9gUGSDL34VODqt+nvgAICIeAt4M20HssEb248+qrEYmKIPRiEdIumPd7eCpH2AERHxGNnpokHA/mQDaF6ZbgRA0rG72cyPyQYQPIXsWQs7WgTMqNjnYOApYLykI1LbwB3vSovsiY6tks5JfforewjQILJnj7wn6dNAZ5/xcODFiLgVeAD45O76297BwWF1JSKC7Nz+Z5Tdjrsa+N/AxohYD9xDNpT4PWSjkUIWDg9KWkn2DOivpfb5wNXpAvQoYBrZxfKVwDFk1zmqrWsN8D/Jnr63kuwX9qGdrNYH+BdJz6Vab41s1NJvkz0Cd2X6fN/ezTZ+Bvwp8EhkjzLe0XeAwZJWpYvzn46INrLrJXenWp8EjtrFulOBq1Kf/wscAtwFNKeaL6HzIezPB1YpGz3548C8TvrbXsCj45qZWS4+4jAzs1wcHGZmlouDw8zMcnFwmJlZLg4OMzPLxcFhZma5ODjMzCwXB4eZmeXy/wGq9SKtjYhMYwAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "sns.countplot(x=\"Customer service calls\", hue=\"Churn\", data=df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上图表明，在客服呼叫 4 次之后，客户的离网率显著提升。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "为了更好的突出 Customer service call 客服呼叫 和 Churn 离网率 的关系，可以给 DataFrame 添加一个二元属性 Many_service_calls，即客户呼叫超过 3 次（Customer service calls > 3）。看下它与离网率的相关性，并可视化结果。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>Churn</th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "      <th>All</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Many_service_calls</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>2721</td>\n",
       "      <td>345</td>\n",
       "      <td>3066</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>129</td>\n",
       "      <td>138</td>\n",
       "      <td>267</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>All</th>\n",
       "      <td>2850</td>\n",
       "      <td>483</td>\n",
       "      <td>3333</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Churn                  0    1   All\n",
       "Many_service_calls                 \n",
       "0                   2721  345  3066\n",
       "1                    129  138   267\n",
       "All                 2850  483  3333"
      ]
     },
     "execution_count": 36,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df[\"Many_service_calls\"] = (df[\"Customer service calls\"] > 3).astype(\"int\")\n",
    "\n",
    "pd.crosstab(df[\"Many_service_calls\"], df[\"Churn\"], margins=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 37,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<matplotlib.axes._subplots.AxesSubplot at 0x7f672e745ef0>"
      ]
     },
     "execution_count": 37,
     "metadata": {},
     "output_type": "execute_result"
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAY4AAAELCAYAAADOeWEXAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDMuMC4zLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvnQurowAAFmhJREFUeJzt3XuwXWWd5vHvQxIIjWnkEhFy0ESMKFFBCWgP2sOIAmZsEYvR0DaGi412oYUzdjvozAiidjne28tYlR5joLsl0oUMGTrCII3tZVRI5BZC06QEmpMBCcHCK2jib/7YK7AN55ycBXuffU7O91O166z9rne9+7dTyXnyrrX2u1NVSJI0XrsNugBJ0tRicEiSWjE4JEmtGBySpFYMDklSKwaHJKkVg0OS1IrBIUlqxeCQJLUyc9AF9MP+++9f8+fPH3QZkjSlrFu37sGqmruzfrtkcMyfP5+1a9cOugxJmlKS3DOefp6qkiS1YnBIkloxOCRJreyS1zgkaVB+85vfMDw8zCOPPDLoUkY1e/ZshoaGmDVr1pM63uCQpB4aHh5mzpw5zJ8/nySDLucJqootW7YwPDzMggULntQYnqqSpB565JFH2G+//SZlaAAkYb/99ntKMyKDQ5J6bLKGxnZPtT6DQ5LUisEhSRPo/vvvZ+nSpRxyyCEceeSRLFmyhOXLl/O6171u0KWNmxfHR3HkX1w86BImjXUff+ugS5B2CVXFySefzLJly1i1ahUAN998M6tXr35K427dupWZMyfu17kzDkmaINdddx2zZs3iHe94x2Nthx9+OK985Sv5+c9/zimnnMLzn/983vKWt1BVQGcJpQcffBCAtWvXcuyxxwJwwQUXcNppp3HMMcdw2mmnsXLlSt74xjdy4oknsnDhQt773vf27X0445CkCbJ+/XqOPPLIEffdeOON3HbbbRx00EEcc8wxfPe73+UVr3jFmONt2LCB73znO+y5556sXLmSm266iRtvvJE99tiDQw89lHe9610cfPDBPX8fzjgkaRI4+uijGRoaYrfdduOII47g7rvv3ukxr3/969lzzz0fe37cccex9957M3v2bA477DDuuWdcaxa2ZnBI0gRZtGgR69atG3HfHnvs8dj2jBkz2Lp1KwAzZ87kt7/9LcATPnux1157jWuMXjM4JGmCvOpVr+LRRx9l+fLlj7XdcsstfPvb3x71mPnz5z8WNpdddlnfaxwPg0OSJkgSLr/8cr7xjW9wyCGHsGjRIt73vvfxzGc+c9Rjzj//fM4991wWL17MjBkzJrDa0WX7lftdyeLFi+upfpGTt+M+zttxpfG7/fbbecELXjDoMnZqpDqTrKuqxTs71hmHJKkVg0OS1IrBIUlqxeCQJLVicEiSWulbcCQ5OMl1STYkuS3JuU37BUk2JbmpeSzpOuZ9STYmuSPJCV3tJzZtG5Oc16+aJUk718+1qrYC76mqHyaZA6xLck2z79NV9YnuzkkOA5YCi4CDgG8keV6z+wvAa4Bh4IYkq6tqQx9rl6Se6PWt/eO9Pf6qq67i3HPPZdu2bbztbW/jvPN693/uvs04quq+qvphs/0z4HZg3hiHnASsqqpHq+ouYCNwdPPYWFU/qqpfA6uavpKkEWzbto1zzjmHr3/962zYsIFLLrmEDRt693/tCbnGkWQ+8BLgB03TO5PckmRFkn2atnnAvV2HDTdto7VLkkZw/fXX89znPpfnPOc57L777ixdupQrrriiZ+P3PTiSPA24DHh3Vf0U+CJwCHAEcB/wyR69ztlJ1iZZu3nz5l4MKUlT0qZNm35nOfWhoSE2bdrUs/H7GhxJZtEJjb+rqq8BVNWPq2pbVf0W+Gs6p6IANgHdC8cPNW2jtf+OqlpeVYuravHcuXN7/2YkSUB/76oK8CXg9qr6VFf7gV3dTgbWN9urgaVJ9kiyAFgIXA/cACxMsiDJ7nQuoD+171mUpF3YvHnzuPfex8/wDw8PM29e787w9/OuqmOA04Bbk9zUtL0fODXJEUABdwNvB6iq25JcCmygc0fWOVW1DSDJO4GrgRnAiqq6rY91S9KUdtRRR3HnnXdy1113MW/ePFatWsVXvvKVno3ft+Coqu8AGWHXmjGO+QjwkRHa14x1nCRNVoNYXXrmzJl8/vOf54QTTmDbtm2ceeaZLFq0qHfj92wkSdKksWTJEpYsWbLzjk+CS45IkloxOCRJrRgckqRWDA5JUisGhySpFYNDktSKt+NKUh/964Uv6ul4z/rArTvtc+aZZ3LllVfyjGc8g/Xr1++0f1vOOCRpF3P66adz1VVX9W18g0OSdjF/+Id/yL777tu38Q0OSVIrBockqRWDQ5LUisEhSWrF23ElqY/Gc/tsr5166ql885vf5MEHH2RoaIgPfvCDnHXWWT0b3+CQpF3MJZdc0tfxPVUlSWrF4JAktWJwSFKPVdWgSxjTU63P4JCkHpo9ezZbtmyZtOFRVWzZsoXZs2c/6TG8OC5JPTQ0NMTw8DCbN28edCmjmj17NkNDQ0/6eINDknpo1qxZLFiwYNBl9JWnqiRJrRgckqRWDA5JUisGhySpFYNDktSKwSFJasXgkCS10rfgSHJwkuuSbEhyW5Jzm/Z9k1yT5M7m5z5Ne5J8NsnGJLckeWnXWMua/ncmWdavmiVJO9fPGcdW4D1VdRjwcuCcJIcB5wHXVtVC4NrmOcBrgYXN42zgi9AJGuB84GXA0cD528NGkjTx+hYcVXVfVf2w2f4ZcDswDzgJuKjpdhHwhmb7JODi6vg+8PQkBwInANdU1UNV9RPgGuDEftUtSRrbhFzjSDIfeAnwA+CAqrqv2XU/cECzPQ+4t+uw4aZttHZJ0gD0PTiSPA24DHh3Vf20e191lo/syRKSSc5OsjbJ2sm8uJgkTXV9DY4ks+iExt9V1dea5h83p6Bofj7QtG8CDu46fKhpG639d1TV8qpaXFWL586d29s3Ikl6TD/vqgrwJeD2qvpU167VwPY7o5YBV3S1v7W5u+rlwMPNKa2rgeOT7NNcFD++aZMkDUA/l1U/BjgNuDXJTU3b+4GPApcmOQu4B3hTs28NsATYCPwSOAOgqh5K8iHghqbfhVX1UB/rliSNoW/BUVXfATLK7uNG6F/AOaOMtQJY0bvqJElPlp8clyS1YnBIkloxOCRJrRgckqRWDA5JUisGhySpFYNDktSKwSFJasXgkCS1YnBIkloxOCRJrRgckqRWDA5JUisGhySpFYNDktSKwSFJasXgkCS1YnBIkloxOCRJrRgckqRWDA5JUisGhySpFYNDktSKwSFJasXgkCS1YnBIkloZV3AkuXY8bZKkXd/MsXYmmQ38HrB/kn2ANLt+H5jX59okSZPQmMEBvB14N3AQsI7Hg+OnwOf7WJckaZIa81RVVf1VVS0A/ryqnlNVC5rH4VU1ZnAkWZHkgSTru9ouSLIpyU3NY0nXvvcl2ZjkjiQndLWf2LRtTHLeU3ivkqQe2NmMA4Cq+lySfwPM7z6mqi4e47CVdGYlO/b5dFV9orshyWHAUmARndnNN5I8r9n9BeA1wDBwQ5LVVbVhPHVLknpvXMGR5G+AQ4CbgG1Nc/HEUHhMVX0ryfxx1nESsKqqHgXuSrIROLrZt7GqftTUsarpa3BI0oCMKziAxcBhVVU9eM13JnkrsBZ4T1X9hM6F9u939Rnm8Yvv9+7Q/rIe1CBJepLG+zmO9cAze/B6X6QzczkCuA/4ZA/GBCDJ2UnWJlm7efPmXg0rSdrBeGcc+wMbklwPPLq9sape3+bFqurH27eT/DVwZfN0E3BwV9ehpo0x2nccezmwHGDx4sW9mBlJkkYw3uC4oBcvluTAqrqveXoynZkMwGrgK0k+Refi+ELgejq3/y5MsoBOYCwF/rgXtUiSnpzx3lX1T20HTnIJcCydDw8OA+cDxyY5gs6F9bvpfE6EqrotyaV0LnpvBc6pqm3NOO8ErgZmACuq6ra2tUiSeme8d1X9jM4ve4DdgVnAL6rq90c7pqpOHaH5S2P0/wjwkRHa1wBrxlOnJKn/xjvjmLN9O0no3BL78n4VJUmavFqvjlsd/ws4YaedJUm7nPGeqnpj19Pd6Hyu45G+VCRJmtTGe1fVH3Vtb6VzYfuknlcjSZr0xnuN44x+FyJJmhrG+0VOQ0kub1a7fSDJZUmG+l2cJGnyGe/F8S/T+ZDeQc3jfzdtkqRpZrzBMbeqvlxVW5vHSmBuH+uSJE1S4w2OLUn+JMmM5vEnwJZ+FiZJmpzGGxxnAm8C7qezqu0pwOl9qkmSNImN93bcC4FlzXdnkGRf4BN0AkWSNI2Md8bx4u2hAVBVDwEv6U9JkqTJbLzBsVuSfbY/aWYc452tSJJ2IeP95f9J4HtJ/r55/h8YYSVbSdKub7yfHL84yVrgVU3TG6tqQ//KkiRNVuM+3dQEhWEhSdNc62XVJUnTm8EhSWrF4JAktWJwSJJaMTgkSa0YHJKkVgwOSVIrBockqRWDQ5LUisEhSWrF4JAktWJwSJJaMTgkSa30LTiSrEjyQJL1XW37JrkmyZ3Nz32a9iT5bJKNSW5J8tKuY5Y1/e9Msqxf9UqSxqefM46VwIk7tJ0HXFtVC4Frm+cArwUWNo+zgS/CY980eD7wMuBo4PzubyKUJE28vgVHVX0LeGiH5pOAi5rti4A3dLVfXB3fB56e5EDgBOCaqnqo+c7za3hiGEmSJtBEX+M4oKrua7bvBw5otucB93b1G27aRmuXJA3IwC6OV1UB1avxkpydZG2StZs3b+7VsJKkHUx0cPy4OQVF8/OBpn0TcHBXv6GmbbT2J6iq5VW1uKoWz507t+eFS5I6Jjo4VgPb74xaBlzR1f7W5u6qlwMPN6e0rgaOT7JPc1H8+KZNkjQgM/s1cJJLgGOB/ZMM07k76qPApUnOAu4B3tR0XwMsATYCvwTOAKiqh5J8CLih6XdhVe14wV2SNIH6FhxVdeoou44boW8B54wyzgpgRQ9LkyQ9BX5yXJLUisEhSWrF4JAktWJwSJJaMTgkSa0YHJKkVgwOSVIrBockqRWDQ5LUisEhSWrF4JAktWJwSJJaMTgkSa0YHJKkVgwOSVIrBockqRWDQ5LUisEhSWrF4JAktWJwSJJaMTgkSa0YHJKkVgwOSVIrBockqRWDQ5LUisEhSWrF4JAktWJwSJJaMTgkSa0MJDiS3J3k1iQ3JVnbtO2b5JokdzY/92nak+SzSTYmuSXJSwdRsySpY5Azjn9XVUdU1eLm+XnAtVW1ELi2eQ7wWmBh8zgb+OKEVypJesxkOlV1EnBRs30R8Iau9our4/vA05McOIgCJUmDC44C/k+SdUnObtoOqKr7mu37gQOa7XnAvV3HDjdtkqQBmDmg131FVW1K8gzgmiT/3L2zqipJtRmwCaCzAZ71rGf1rlJJ0u8YyIyjqjY1Px8ALgeOBn68/RRU8/OBpvsm4OCuw4eath3HXF5Vi6tq8dy5c/tZviRNaxMeHEn2SjJn+zZwPLAeWA0sa7otA65otlcDb23urno58HDXKS1J0gQbxKmqA4DLk2x//a9U1VVJbgAuTXIWcA/wpqb/GmAJsBH4JXDGxJcsSdpuwoOjqn4EHD5C+xbguBHaCzhnAkqTJI3DZLodV5I0BRgckqRWDA5JUisGhySpFYNDktSKwSFJasXgkCS1Mqi1qjSF/OuFLxp0CZPGsz5w66BLkAbOGYckqRWDQ5LUisEhSWrF4JAktWJwSJJaMTgkSa0YHJKkVgwOSVIrBockqRWDQ5LUisEhSWrF4JAktWJwSJJaMTgkSa24rLqkKc1l/x83Ucv+GxzSFHTkX1w86BImjcvnDLqC6cdTVZKkVgwOSVIrBockqRWDQ5LUisEhSWplygRHkhOT3JFkY5LzBl2PJE1XUyI4kswAvgC8FjgMODXJYYOtSpKmpykRHMDRwMaq+lFV/RpYBZw04JokaVqaKsExD7i36/lw0yZJmmC7zCfHk5wNnN08/XmSOwZZz67k2bA/8OCg65gUzs+gK9AO/PvZ5an//Xz2eDpNleDYBBzc9XyoaXtMVS0Hlk9kUdNFkrVVtXjQdUgj8e/nxJsqp6puABYmWZBkd2ApsHrANUnStDQlZhxVtTXJO4GrgRnAiqq6bcBlSdK0NCWCA6Cq1gBrBl3HNOUpQE1m/v2cYKmqQdcgSZpCpso1DknSJGFwaEwu9aLJKMmKJA8kWT/oWqYjg0OjcqkXTWIrgRMHXcR0ZXBoLC71okmpqr4FPDToOqYrg0NjcakXSU9gcEiSWjE4NJadLvUiafoxODQWl3qR9AQGh0ZVVVuB7Uu93A5c6lIvmgySXAJ8Dzg0yXCSswZd03TiJ8clSa0445AktWJwSJJaMTgkSa0YHJKkVgwOSVIrBockqRWDQ1NSkkryt13PZybZnOTKQdb1VCS5MMmrB/TaFyT582Z7ZZJTBlGHpoYp89Wx0g5+AbwwyZ5V9SvgNUyB5VCSzGw+WPkEVfWBia5HejKccWgqWwP8+2b7VOCS7TuSHJ3ke0luTPJ/kxzatJ+e5GtJrkpyZ5KPNe1nJvlM1/F/muTTI71okr2S/EOSm5OsT/Lmpv3IJP+UZF2Sq5Mc2LR/M8lnkqwF/kuSe5Ls1jXWvUlmdf9PP8lRTd03J7k+yZwkM5J8PMkNSW5J8vax/nCS/OcktzZjfLTrfd3QtF2W5Pd2MsZHk2xoXu8TY/XV9GFwaCpbBSxNMht4MfCDrn3/DLyyql4CfAD4y659RwBvBl4EvDnJwcClwB8lmdX0OQNYMcrrngj8v6o6vKpeCFzVHPc54JSqOrI59iNdx+xeVYur6oPATcC/bdpfB1xdVb/Z3rFZF+yrwLlVdTjwauBXwFnAw1V1FHAU8KdJFoxUYJLX0vnulJc1Y3ys2fW1qjqqabu9GXNESfYDTgYWVdWLgQ+P1lfTi6eqNGVV1S1J5tOZbazZYffewEVJFgIFzOrad21VPQyQZAPw7Kq6N8k/Aq9Lcjswq6puHeWlbwU+meS/A1dW1beTvBB4IXBNEoAZwH1dx3x1h+03A9fRWTjyf+ww/qHAfVV1Q/M+f9rUejzw4q7rD3sDC4G7Rqjx1cCXq+qXzRjbv/TohUk+DDwdeBqddchG8zDwCPCl5trRlL1+pN4yODTVrQY+ARwL7NfV/iHguqo6uQmXb3bte7RrexuP/zv4n8D76cxWvjzaC1bVvyR5KbAE+HCSa4HLgduq6g9GOewXO9T8l0n2BY4E/nH0t/c7Aryrqsb6Zb8zK4E3VNXNSU6n8+c2oqramuRo4DjgFDoLXr7qKby2dhGeqtJUtwL44Aizg715/GL56eMZqKp+QOf7R/6YruslO0pyEPDLqvpb4OPAS4E7gLlJ/qDpMyvJolFe5+d0lqz/Kzozlm07dLkDODDJUc1Yc5LMpDM7+LPtp9OSPC/JXqOUeQ1wxvZrGE1IAcwB7mvGeMvofxqQ5GnA3lW1BviPwOFj9df04YxDU1pVDQOfHWHXx+icqvqvwD+0GPJS4Iiq+skYfV4EfDzJb4HfAH9WVb9uTiF9NsnedP5tfQYYbRn6rwJ/zwj/42/GejPwuSR70rm+8Wo6M6L5wA/TOR+2GXjDSINX1VVJjgDWJvk1nVN57wf+G51rQZubn3PGeJ9zgCuaa0gB/tMYfTWNuKy61KU5l//pqrp20LVIk5WnqiQgydOT/AvwK0NDGpszDmkUze2oI4XIcVW1ZaLrGUmSFwF/s0Pzo1X1skHUo+nB4JAkteKpKklSKwaHJKkVg0OS1IrBIUlqxeCQJLXy/wEEhwjVZcTOvwAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "sns.countplot(x=\"Many_service_calls\", hue=\"Churn\", data=df)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "现在我们可以创建另一张交叉表，将 Churn 离网率 与 International plan 国际套餐 及新创建的 Many_service_calls 多次客服呼叫 关联起来。\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th>Churn</th>\n",
       "      <th>0</th>\n",
       "      <th>1</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>row_0</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>False</th>\n",
       "      <td>2841</td>\n",
       "      <td>464</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>True</th>\n",
       "      <td>9</td>\n",
       "      <td>19</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "Churn     0    1\n",
       "row_0           \n",
       "False  2841  464\n",
       "True      9   19"
      ]
     },
     "execution_count": 38,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pd.crosstab(df[\"Many_service_calls\"] & df[\"International plan\"], df[\"Churn\"])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "上表表明，在客服呼叫次数超过 3 次并且已办理 International Plan 国际套餐 的情况下，预测一名客户不忠诚的准确率（Accuracy）可以达到 85.8％，计算公式如下："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "$$准确率（Accuracy）=\\frac{TP+TN}{TP+TN+FP+FN}=\\frac{2841+19}{2841+9+19+464}\\times100\\%$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "其中，TP 表示将 True 预测为 True 的数量，TN 表示将 Flase 预测为 Flase 的数量，FP 表示将 Flase 预测为 True 的数量，FN 表示将 True 预测为 Flase 的数量。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "复习一下本次实验的内容："
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- 样本中忠实客户的份额为 85.5%。这意味着最简单的预测「忠实客户」的模型有 85.5% 的概率猜对。也就是说，后续模型的准确率（Accuracy）不应该比这个数字少，并且很有希望显著高于这个数字。\n",
    "- 基于一个简单的「（客服呼叫次数 > 3） & （国际套餐 = True） => Churn = 1, else Churn = 0」规则的预测模型，可以得到 85.8% 的准确率。以后我们将讨论决策树，看看如何仅仅基于输入数据自动找出类似的规则，而不需要我们手工设定。我们没有应用机器学习方法就得到了两个准确率（85.5% 和 85.8%），它们可作为后续其他模型的基线。如果经过大量的努力，我们仅将准确率提高了 0.5%，那么我们努力的方向可能出现了偏差，因为仅仅使用一个包含两个限制规则的简单模型就已提升了 0.3% 的准确率。\n",
    "- 在训练复杂模型之前，建议预处理一下数据，绘制一些图表，做一些简单的假设。此外，在实际任务上应用机器学习时，通常从简单的方案开始，接着尝试更复杂的方案。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 实验总结"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "本次实验使用 Pandas 对数据进行了一定程度的分析和探索，交叉表、透视表等方法的运用将使你在数据探索过程中事半功倍。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<i class=\"fa fa-link\" aria-hidden=\"true\"> 相关链接</i>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "- [<i class=\"fa fa-external-link-square\" aria-hidden=\"true\"> Pandas 官方文档</i>](http://pandas.pydata.org/pandas-docs/stable/index.html)\n",
    "- [<i class=\"fa fa-external-link-square\" aria-hidden=\"true\"> 10 minutes to pandas</i>](http://pandas.pydata.org/pandas-docs/stable/10min.html)\n",
    "- [<i class=\"fa fa-external-link-square\" aria-hidden=\"true\"> Pandas cheatsheet PDF</i>](https://github.com/pandas-dev/pandas/blob/master/doc/cheatsheet/Pandas_Cheat_Sheet.pdf)\n",
    "- [<i class=\"fa fa-external-link-square\" aria-hidden=\"true\"> scipy-lectures.org 教程</i>](http://www.scipy-lectures.org/index.html)\n",
    "- [<i class=\"fa fa-external-link-square\" aria-hidden=\"true\"> 了解实验楼《楼+ 机器学习和数据挖掘课程》</i>](https://www.shiyanlou.com/louplus/)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.6"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
